The GridSite Security Framework Andrew McNab Abstract

advertisement
The GridSite Security Framework
Andrew McNab and Shiv Kaushal
Department of Physics and Astronomy, University of Manchester, Manchester, UK
Abstract
We describe the architecture of the GridSite system, which adds support for several Grid security
protocols to the Apache web server platform. These include the Globus GSI authentication system,
GACL and XACML access policy files, and DN Lists and VOMS group memberships. The system
was originally developed for controlling access to Web sites using Grid credentials, but has now
been extended to support Web Services written in any of the languages which can be hosted by
Apache. We use the example of a proxy delegation service developed in conjunction with the
EGEE project to explain how such Web Services can be built using GridSite/Apache and a SOAP
toolkit such as gSOAP. To support high speed access to large data files, GridSite also supports an
HTTP Downgrade protocol, which we present. Finally, we describe GridSite's method of using
Unix pool accounts to provide partial “sandboxing” of services, which allows remote users to
deploy services in the form of scripts and native executables into a third-party hosting service built
with GridSite. A model we refer to as GRACE.
1. Introduction
GridSite1 was originally developed as the
management system for the GridPP2 project's
website. The original architecture was a result
of the special requirements of GridPP and its
need for security to span both Web and Grid
environments.
Largely due to the influence of the Globus
Project3, large production grids such as the US
Open Science Grid4, and the world-wide LHC
Computing Grid5 have used authentication
infrastructures based on X.5096 certificates and
the SSL/TLS7 protocol. The basis of this
infrastructure was adapted from the Web, and
the only differences from conventional ecommerce HTTPS websites (such as Amazon or
eBay) have been the greater use of client-side
certificates rather than usernames and
passwords, and the Globus proxy extensions to
X.509 certificates, described in IETF RFC3820.
Grid authorization tools and protocols built on
top of this authentication infrastructure have
provided ways of managing and publishing
group membership information (VO-LDAP,
VOMS), of proving group membership via
attribute certificates (VOMS), and ways of
describing access policies (GACL.)8
The first phase of GridSite development we
describe in this paper, involved adding these
grid extensions to what was originally a Web
infrastructure, back into the industry-standard
Apache9 web server. This has allowed
credentials from the users' Grid activities to be
used when accessing or modifying otherwise
conventional websites.
The current phase, which this paper
concentrates on, reflects the move within
production grids from binary protocols to those
based on SOAP Web Services, and is largely
concerned with programmatic access to data
and services. In this, we are providing support
for the reuse of mainstream web tools in grid
environments.
Finally, we outline the GRACE model for
deploying scripts and binary executables on
third party hosting service sites using GridSite.
2. Website access control
2.1 Authentication
Users are identified by X.509 certificates loaded
into unmodified web browsers. With commonly
used browsers such as Firefox or Internet
Explorer, several ways of storing the private key
may be used, such as in an encrypted keys file
or in an external hardware token.
2.2 Virtual Organisations
Virtual Organisation and group mmberships are
defined by lists of members' certificate names
(“DN Lists”) or by the Fully Qualified Attribute
Names of VOMS attribute certificates.
GridSite stores DN Lists in plain text files,
and refers to them by LDAP or HTTPS URLs.
DN Lists can be retrieved asynchromously from
remote authorization servers, or managed
locally using the tools described below.
The certificate DNs of authenticated users
are simply matched with the file containing the
relevant DN List to determine their VO and
group memberships.
For handling VOMS attribute certificates, a
simple parser of X.509 attribute certificates in
ASN.110 format was written. This relies on the
invariant layout of the ASN.1 tree of objects in
attribute certificates. Rather than define the full
set of call back functions for all the ASN.1
objects which these certificates can contain, a
simpler approach was used: the ASN.1 tree of
data nodes is unrolled, each node assigned a coordinate and the invariant co-ordinates of each
node is used to find its value when required.
downloads at the Unix command line, and are
suitable for using with scripts.
When GridSite is being used and users can
be properly authenticated and authorized
according to access policies, it is practical to
also support the HTTP PUT method which
allows direct uploading of files. GridSite adds
support for HTTP PUT to Apache, subject to
the access controls described above.
To complement this, GridSite also includes
htcp, htls, htll and htrm – a suite of command
line tools similar to scp which allow files to be
copied, listed or deleted via HTTP(S).
2.4 Access Policies
An XML Grid Access Control Language
(GACL) was developed for use with GridSite
and other components of the European DataGrid
project. For use with websites, the GACL file
governing a particular directory or hierarchy is
simply stored in that directory as a file .gacl.
This policy file allows read, write, list and
admin access (giving the ability to modify the
policy itself) to be granted or denied on the
basis of X.509 identity, GSI proxies, DN List
membership, or the possession of a VOMS
attribute certificate.
As an alternative, GridSite now also
supports simple XACML11 policies, which are
restricted to have the same content as GACL
policies. Nevertheless, they are syntactically
correct and can also be evaluated by Sun's
reference XACML implementation in Java, for
example.
When GridSite reads a policy, stored in the
.gacl file or otherwise, it determines whether
GACL or XACML is present and transparently
uses the correct parser. When writing policies,
the choice of GACL or XACML can be set by
the webserver administrator on a per-directory
basis.
2.5 Management tools
Providing straightforward website management
tools has been central to the success of GridSite.
A CGI executable is provided which allows
authorized users to edit HTML pages in place
through their web browser, upload files, and
manage directories. The history of each file and
previous versions are recorded for auditing and
giving credit. The management tools also
include an editor for access policies, which
allows GACL or XACML policies to be
constructed using HTML forms and menus,
rather than by hand-editing XML files. An
editor is also provided for locally-managed DN
Lists.
2.6 Programmatic file copying
HTTP servers are frequently used as file
servers, to distribute binary files rather than just
human-readable web pages. Tools such as wget
and curl exist to enable these kinds of
2.7 Dynamic Content
GridSite provides a small amount of formatting
for HTML pages, by inserting standard headers
and footers and adding links to available page
management functions depending on the user's
level of authorization.
However GridSite can also co-exist with
richer dynamic content systems such as CGI
scripts and PHP server-side processing. In this
case, GridSite provides access control as before,
but the dynamic content operates as normal. If
finer grained authorization is required, then the
results of GridSite's credential parsing and
evaluation are available to scripts as CGI
environment variables.
Later in this paper, we describe how this
feature has been used to provide support for grid
security protocols to Web Service running on
Apache web servers.
3. HTTP Downgrade
One of the requirements for High Energy
Physics use of the grid is for the distribution of
terabyte scale datasets in the form of many files
which are several gigabytes in size. When
transferring such large datasets between sites,
special attention must be given to achieving
balanced performance from each component in
the stack of disk, operating system, server
software, network protocol and network
infrastructure to avoid bottlenecks. The HTTP
Downgrade mechanism, which GridSite
supports, provides a way of using optimised
disk to network transfers and using the
mainstream Apache web server.
3.1 Data channels
It has been common practice in existing
High Energy Physics grids to use the GridFTP
protocol for bulk data transfers. GridFTP
provides X.509/GSI based authentication for the
control channel, and an unencrypted data
channel. Whilst avoiding the CPU overhead of
encrypting
the
data
channel,
further
performance benefits could be gained by using
operating system features such as the sendfile()
system call which transfer data directly from
disk to network without copying into userowned memory.
The Apache webserver implements exactly
this architecture for the HTTP protocol.
Preliminary experiments on wide area networks
have indicated that HTTP/Apache can have
better performance for bulk data transfers than
GridFTP12.
However, HTTP provides only limited
authentication methods – essentially usernames
and passwords – which aren't part of the
authentication and authorization infrastructure
developed for the Grid. For this reason, it has
been necessary to develop a way of establishing
a secure channel to provide authentication and
authorization.
3.2 The HTTP Downgrade protocol
In the HTTP Downgrade protocol we have
developed, an HTTPS request is made to the
server for the file in question, with a new HTTP
header, HTTP-Downgrade-Size, present. This
header notifies the server that the client
understands the downgrade protocol and
specifies the minimum size of file to transfer via
HTTP rather than HTTPS. (Small files should
be returned in the response to the HTTPS
request, since establishing a new connection has
an overhead of its own.)
If the server chooses to return the file via
HTTP, then it issues a standard HTTP redirect
response to the client, giving the URL of an
HTTP copy of the file. In the general case, this
need not be on the same machine as the HTTPS
server contacted for the initial request: it could
even be one of a farm of data servers attached to
a master HTTPS server.
Along with the redirect response, the server
sends the client an HTTP cookie containing a
one-time passcode which must be used when
requesting the file via HTTP. In the GridSite
implementation, the passcode is a random
number stored by the HTTPS server and deleted
when it is first used in an HTTP request. Since
the file name in question is also stored, the
passcode cannot be used to obtain other files.
To retrieve the file, the client simply
presents the passcode in an HTTP Cookie
header. Since the HTTP protocol aspects of this
mechanism are standard HTTP (with the
exception of the initial HTTP-Downgrade-Size
header), then unmodified HTTP clients can be
used to perform these transfers. For example,
the curl13 Unix command-line client can be used
without modification by simply using its option
to insert custom headers in requests.
This architecture has many further
possibilities, such as clients which use multiple
data streams to fetch multiple blocks of a file in
parallel using the HTTP Range header, but
which maintain a single HTTPS control channel
to obtain the necessary redirection URLs and
passcodes. Since HTTP and HTTPS support
persistent connections, with multiple requests
transmitted in series, this arrangement avoids
the overheads associated with repeatedly
opening and closing TCP/IP connections and
establishing SSL/TLS encrypted sessions.
4 Web services
Protocols based on Web Services provide
important benefits for Grids, particularly in
avoiding the tendency that proprietary binary
protocols frequently become closely tied to
particular implementations or languages. The
language neutrality of Web Services has
received less emphasis so far, since most
development of Web Services for Grids has
been done in the Java languge. However, due to
Apache's support for dynamic content created
by
a
wide
variety
of
languages,
GridSite/Apache is able to support secure web
services written in C, C++, Perl, Python and
other scripting languages.
4.1 Modular architecture
Apache supports two main classes of content:
static web pages and files, and dynamic content
obtained by running a CGI program and
returning its output to the client. Each request is
processed by a chain of modules which modify
the input or output stream of data in some way.
The GridSite extensions have been
implemented as an Apache module. Other
architectures were considered, such as filters (in
which an external program is called to
transform the output stream) or providing
library functions which CGI programs can call
to parse grid security credentials. However,
these either suffer from performance penalties
or complicate the CGI interface, or both.
The GridSite module makes preliminary
access control decisions as described above in
the description of web sites, but also exports the
parsed grid security credentials and the
permissions granted according to the governing
GACL or XACML policy. This information is
exported as environment variables, using the
same mechanism as used by the CGI API itself:
the request and response are communicated via
the stdin and stdout of the CGI process, and out
of band information, such as authentication or
remote network addresses, are communicated
via environment variables.
This means that all the languages suitable
for writing CGI executables and scripts are
immediately able to access GridSite's evaluation
of X.509, GSI, VOMS, GACL and XACML
credentials and applicable policies.
4.2 Delegation portType
The Delegation portType implementation
included in GridSite illustrates how web
services can be built using C and the gSOAP
toolkit.
This protocol was agreed within the EGEE
project14, and both C (GridSite) and Java
implementations are available as part of EGEE's
gLite framework. The protocol itself is very
similar to the “G-HTTPS” HTTPS delegation
extensions developed as part of earlier versions
of GridSite, but recast as Web Services.
To perform a delegation, the client sends a
Get Proxy Request message to the server, which
causes the server to generate a public and
private key and return an X.509 certificate
signing request containing the public key.
The client then signs this request using its
own private key and certificate, and sends a Put
Proxy message back to the server, containing
the signed certificate. Together the private key
which the server generated (and which has not
crossed the network) and the new certificate
form an RFC3820 / GSI proxy.
gSOAP provides tools for generating the
WSDL description of such a Web Service from
the C header files of the functions which
implement it; or vice versa. With a consistent
WSDL description and populated callback
functions which implement the actual X.509
key and signing functions, an executable can be
built which will operate as a CGI program.
This means that SOAP requests received by
Apache will be fed into the stdin of the CGI
program and the SOAP response will be taken
from the stdout. Since all the required
authentication information is made available as
environment variables by GridSite, the CGI
Web Service program can obtain these directly
without needing to be linked to the GridSite
library.
However, in the case of the delegation
service, there are two levels of credential
processing taking place: authentication and
authorization of the client attempting the
delegation, and then generation of the new
credentials themselves. For this reason, in the
special case of the GridSite delegation service,
the executable is linked to the GridSite library,
to obtain access to its private key and certificate
handling functions.
The delegation service operates as a service
with a single portType. However, since the bulk
of the code necessary for delegation is part of
the GridSite library, it is straightforward to add
a delegation portType to other services which
require delegation to function.
If the standalone delegation service is used,
then a mechanism is needed to share the
credentials with other services. To facilitate this,
the delegated credentials are stored in the local
filesystem, identified by a Delegation Session
ID specified during the delegation process.
5. Native execution for services
5.1 Jobs and pool accounts
In large production grids such as the LHC
Computing Grid, there has been a focus on
providing support for jobs written as scripts and
native binary executables. This has partially
reflected the heritage of the applications of
these grids, such as High Energy Physics, with
its large investment in Fortran/C/C++ analysis
and simulation codes.
For this reason, effort has been put into
providing native execution environments at
remote sites on the grid. One of the issues this
approach must deal with is the danger that
careless or malicious jobs from one user will
interfere with other users' files or programs, and
the pool account system developed by one of us
(A.M.) for EDG and adopted by LCG and
EGEE has provided one solution.
5.2 CGI scripts and suexec
A not dissimilar problem has been faced in the
mainstream web world, where web server
administrators have needed to host CGI
executables provided by multiple users (perhaps
in a commercial, third-party hosting service,
where no trust relation exists beyond monthly
credit card payments of hosting charges.)
The Apache software provides a solution for
this by allowing CGI scripts or executables to
be run as different Unix users at the level of
each virtual host (each apparent website.) This
mechanism, named suexec after the wrapper
program which it relies on, is widely used but is
tied to fixed configuration decisions made when
the Apache web server is started.
5.3 Combining pool accounts and suexec
One of our goals has been to provide support for
thirty party hosting of Web Services for grids,
even when the service is written as a script or
native executable. This requires some form of
sandboxing of users, to prevent them interfering
with each other's files or programs, in the same
way as must be prevented for remote batch jobs.
To do this, we have combined the pool
account system with the Apache suexec
mechanism (renamed to gsexec.) As well as
providing legacy support compatible with
Apache's default, this allows two new modes of
operation.
5.4 Modes of operation
First, a CGI web service can be executed as a
Unix pool user associated with the authenticated
identity of the client. That is, based on their
X.509 certificate or GSI proxy. The lock files
associated with the pool mechanism mean that
the same client certificate will be associated
with the same pool account on subsequent
requests (until the account lease expires, and the
file space associated with the account is
recycled.) This allows services to maintain
internal session information in the form of
temporary files owned by pool users, and
protected from interference by the Unix file
permissions system. (It can also be used for
other user-like permission systems, such as
MySQL databases.)
In the second mode, a pool account is
associated with the CGI web services stored in a
particular directory. This means that for every
remote client, the same Unix account will be
used (and the CGI services are therefore
responsible for maintaining separation between
the sessions of different authenticated users.)
This mode is intended to support third-party
services, where user A is given write access to a
directory capable of hosting CGI services.
Service scripts or executables can be deployed
by simply uploading them using GridSite's
manual or programmatic interfaces, and then the
service can access requests from other users,
B1, B2, ... . Because A's service runs as a
dedicated pool account, if another user C also
has the ability to deploy services to their own
directory, then C still cannot interfere with A's
files from their distinct pool account.
Without these mechanisms, either all the
services must run as the same “apache” or
“nobody” Unix account, which permits conflicts
between users' actions, or each user must be
configured
individually
by
the
site
administrator, which requires that the server is
shut down, all sessions are stopped and the
server is started with the new configuration.
6. GRACE
This combination of the ability to manage gridfacing access permissions through GridSite, and
local file access permissions via pool accounts
allows us to define a new execution model for
Web Services in grid environments, which we
refer to as GRACE (“GRidSite, Apache, CGIscripts and Executables.”)
GRACE offers an alternative to the reliance
on Java for webservices, and is especially
attractive to applications which have a large
investment in executable code, or have
performance requirements which are not suited
to current implementations of Java.
Furthermore, the ability to use standard
scripting languages such as Perl, Python and
even PHP to provide Web Services offers
possibilities of rapid prototyping of simple
services, in languages which site administrators
and scientists typically use for day to day
automation tasks.
Acknowledgements
This work was funded by the Particle
Physics and Astronomy Research Council
through their GridPP and e-Science Studentship
programmes.
We would also like to thank other members
of the various EDG and EGEE security working
groups for providing much of the wider
environment into which this work fits.
References
1. A previous version of GridSite is described
in “The GridSite Web/Gridsecurity system”,
A. McNab, Softw. Pract. Exper. 2005;
35:827-834. GridSite software is available
from http://www.gridsite.org/
2. For more about the GridPP project, see
http://www.gridpp.ac,uk/
3. The Globus Project: http://www.globus.org/
4. The Open Science Grid:
http://www.opensciencegrid.org/
5. The LHC Computing Grid:
http://lcg.web.cern.ch/LCG/
6. X.509v3 is described in IETF RFC2459,
“Internet X.509 Public Key Infrastructure
Certificate and CRL Profile.”
7. TLS, the most recent version of SSL, in
RFC2246, “The TLS Protocol Version 1.0”
8. VO-LDAP, VOMS, DN Lists and GACL
are all described in the EDG Security Coordination Groups paper, “Authentication
and Authorization Mechanisms for MultiDomain Grid Environments”, L. A.
Cornwall et all, Journal of Grid Computing
(2004) 2: 301-311.
9. The Apache Web Server:
http://httpd.apache.org/
10. RFC 3281 “An Internet Attribute Certificate
Profile for Authorization”
11. Information about XACML specifications
and implementations can be found at:
http://www.oasisopen.org/committees/xacml/
12. Richard Hughes-Jones, private
communication and talk at GNEW 2004.
13. Curl and libcurl: http://curl.haxx.se
14. The EGEE (Enabling Grids for E-SciencE)
Project: http://public.eu-egee.org/
Download