Web Services RAT Report

advertisement
Gateways Web Services Requirements
Compiled by Ivan R. Judson
Overview of the Emerging Gateway Model
This document is a compilation of requirements from the initial gateways
identifying the needs with respect to Web Services. During conversations over
the past quarters the gateways groups and the TeraGrid Security group have
negotiated a common conceptual framework for gateways, what information they
are responsible for and what information the TeraGrid is responsible for,
including how these disparate pieces of information are rationalized in the event
of crisis, accounting, or inquiry.
The model that has been agreed upon involves the RPs relinquishing some first
hand knowledge of who is running jobs on their resource, in exchange for wider
usage. However, RPs are not left with entirely no idea of who is using the
resource, instead the gateway must agree to keep the same or more information
about the gateway users as the RPs would, and provide an interface for RPs to
retrieve that information. This amounts to a "Service Level Agreement" between
RPs and Gateways, which, if it includes appropriate liability, indemnification, and
intellectual property clauses, seems to completely define the necessary
relationship between the two without creating any more technical complexity
than already exists.
The largest motivating factor for this is scalability, since as the number of
gateway users grows for the initial set of gateways, and the number of gateways
possibly grows, the number of "accounts" to keep track of is enormous. Even if
the identification mechanism is merely a DN (where merely a DN hides the
complexity of how the end user got it, manages it, and uses it), that would imply
ever RPs still manages tens of thousands of DNs and that somewhere there is a
user database that keeps track of all the requisite user information associated
with that DN.
The alternative approach to the SLA that was discussed was actually an
extension of it, by requiring that all submitted jobs be "annotated" with a unique
identifier that RPs could use, if necessary, to retrieve the user information for
any given job. However, it was quickly realized that the (resource, job id)
information is sufficiently unique and kept track of by the gateway that it is
sufficient as a unique id without introducing any further annotations.
Once this model was agreed upon, and all parties were given time to consider
the alternatives, it became evident that this was the best way to proceed
forward. Using this model, most of the gateways have indicated that there are a
set of service interfaces that they need in order to sufficiently provide services to
their users. All of the gateways agree that using web services technology to
access these services is what they anticipate doing, and assume the TeraGrid's
implementation of Web Service interfaces addresses their needs.
Web Services Needs
In order for the gateways to provide sufficient services to their users, there are a
set of service interfaces that they expect to use. The assumption, so far, has
been that these interfaces will be accessed using standard web services
technology. The services are divided into two groups, those interfaces provide by
the TeraGrid and those interfaces provided by the Gateways.
By the TeraGrid
The list of services that have been identified by the gateways developers
includes:







Resource Status Service (both polling and pub/sub)
Job Submission Interface
o The gateways expect this to be provided by WS-GRAM
Job Tracking Interface (Both polling and pub/sub)
File/Data Staging Interface
Retrieve Usage Information
Retrieve Inca Info
Advanced Reservation Interface
All of these interfaces are self-explanatory, and clear as to the necessity of them
for various features and functionality the gateways provide, therefore, no
explanation is provided.
(These were submitted by Sebastien, but I think they need clarification)


Cross-site Runs (?)
Pushing DN to an RP (?)
By the Gateways
The list of services that have been identified by the gateways developers and the
TeraGrid Security group includes:


Retrieve user information for a job
Retrieve accounting information/statistics
This interface, whether programmatic, human-in-the-loop, or otherwise
implemented, provides the necessary means by which to track down problem job
submissions, identify malicious users, and tabulate accounting and logging
information for reporting needs by the RPs. It is expected that the information
provided for the first interface is simply the (resource, job id) that is known by
both parties at job submission time. This interface provides sufficient user
information for the RPs to deal with the situation at hand, and possibly identifies
another interface that should be provided by the gateways:

Don't submit jobs from the user who submitted job (resource, job
id), until we say it's Ok.
The accounting interface requires no information, but returns sufficient
accounting information and statistics to report to funding agencies, program
managers, etc.
Impact of GT4 Deployment
Obviously the impending deployment of GT4 will have a significant impact on
how the service interfaces identified might be provided. However, gateways
developers are developing gateways that interact with many resources, of which
the TeraGrid might be the largest, but it is not the only resource. It's important
to recognize that the gateways themselves may or may not choose to be built
using GT4 and thus the choice of how the TeraGrid provides interfaces should
not necessitate the adoption of GT4 by the Gateways developers. In fact,
choosing the provide the interfaces via verifiably interoperable means, will in
fact, make the interfaces even more valuable for the next set of gateways that
are developed and deployed.
That being said, according to Mike Showerman, the plan for deploying GT4 is as
follows:








CTSS candidate testing for gridftp (hopefully 4.0.1, ASAP)
Globus core 4.0.1 (before SC05)
RFT (before SC05)
Web Services Gram (before SC05)
WEB MDS (before SC05)
RLS (before SC05)
CAS (before SC05)
pre web services GRAM (should be a redeployment of existing gram
environment, on alternate ports if before SC05, production install
after SC05)
It appears at first glance that this will align nicely with what the gateways
developers identify as their needs. However, I'm not sure any of the gateway
developers are familiar enough with these services to know whether they
sufficiently meet the needs of their gateway. As the developers work on their
implementations, it would be a good idea to keep an ongoing dialogue about the
service interfaces, how they are sufficient and insufficient and what interfaces
are missing.
Conclusion
The relationship between the Gateways and the TeraGrid has been informally
defined for too long, this document clearly characterizes the relationship.
Included is the identification of the information required to be managed by either
side, the interfaces to retrieve that information and prioritized list of interfaces to
deploy. This will allow the gateways to accelerate the growth of their user
communities, significantly increasing the impact of the TeraGrid on science.
Appendix: Prioritized List of Service Interfaces
Provided by the TeraGrid:









Resource Status Service (both polling and pub/sub)
Job Submission Interface (WS-GRAM?)
Job Tracking Interface (Both polling and pub/sub)
File/Data Staging Interface
Getting Usage Information
Getting Inca Info
Advanced Reservation
Cross-site Runs
Pushing DN to an RP
Provided by the Gateways:


Retrieve user information for a job
Register job status (running, stopped for security reasons, failing)?
Download