Middleware Working Group Report Initial Evaluation & Future Directions Andrew N. Jackson

advertisement
Middleware Working Group Report
Initial Evaluation & Future Directions
Andrew N. Jackson
Chip Watson
ILDG2 - 2nd May 2003
1
Andrew N. Jackson
May 2003
Middleware Recap
 ILDG: A Grid-Of-Grids
 An aggregation of a number
of datagrids.
 Using a standardised webservice interface.
 Initially linking U.S. and U.K.
datagrids.
 To demonstrate basic contentlisting (‘ls’) by Lat03.
 Structure
 U.K.’s QCDgrid as a single entity.
 With many RO nodes and one
RW node.
 Every U.S. storage facility appears
as a separate entity.
 Gateway running on lqcd.org
provides the overall interface.
2
Client
ildgFileService
[A web service.]
U.S. SRM
U.S. SRM
U.S. SRM
QCDgrid
& others...
Andrew N. Jackson
May 2003
Progress
 Only gentle progress so far...
 Added basic WG overview on http://www.lqcd.org/ildg/.
 Andrew Jackson & Chip Watson (co-conveners of the MWG) have
been discussing overall status and direction.
 Andrew Jackson has (with James Perry’s help) written some
experimental code to test the Grid-Of-Grids concept.
 The ILDG-Middleware mailing list has been set up (this week!).
 ILDG Middleware WG <ildg-middleware@epcc.ed.ac.uk>
 Please contact me, Andrew Jackson <A.N.Jackson@ed.ac.uk> to be added to
this mailing list.
3
Andrew N. Jackson
May 2003
Required Interfaces
 The ILDG Services will be built on...
 Replica Catalogue
 Currently different incompatible implementations.
 Metadata Catalogue
 QCDgrid currently has its own metadata catalogue based on the eXist XML
database, queried via XPath. http://exist.sourceforge.net/.
Note that EDG-WP2 is creating a combined ‘ReplicaMetadataCatalog’.
 Storage Resource Manager
 EDG-PPDG are developing a standard.
 Implementation code will follow shortly afterwards.
 Such standards can hide the differences between Replica Catalogues.
 And perhaps Metadata Catalogues too.
 File-transfer daemons
 A solved problem, though there are many to choose from/support:
 http, ftp, gridftp, jparss, bbftp....
4
Andrew N. Jackson
May 2003
SRM 2.x
 The Storage Resource Manager Interface
 An emerging standard:
 v2.1 is being currently being co-written by EDG and PPDG.
 Broad functionality:
 Maps logical filenames (LFN) to:
 One or more physical locations (Storage File Name, SFN, of the form
sfn://srm.hostname/physical/path/file.name).
 (And perhaps) Global Unique Identifier (GUID/GlobalFileID), used to refer a
specific file.
 Provides Space Management Functions.
 e.g. request free space in the future.
 Provides Directory and File Functions.
 e.g. srmLs lists directory contents. (Very useful for ILDG).
 srmMv/srmCp for moves and copies.
 etc...
 Provides Data Transfer Functions. (Very useful for ILDG).
 See http://sdm.lbl.gov/srm for more...
5
Andrew N. Jackson
May 2003
EDG ReplicaMetadataCatalog
 Aims to unify the metadata and replica management.
 Associating extra data with each entry in the replica catalogue.
 Still very early in its development.
 Can store ‘attributes’, and associate them with a given LFN.
 includes examples like filesize, date of creation, etc.
 It is not clear how our XML Schema would fit into this model.
 References
 The ‘RMC’, under EDG http://www.eu-datagrid.org/
 WP2: http://edg-wp2.web.cern.ch/edg-wp2/.
 See http://proj-grid-data-build.web.cern.ch/proj-grid-data-build/edg-metadatacatalog/ for more details.
6
Andrew N. Jackson
May 2003
ILDG Services
 To build on top of these tools
 Create SRM2.x interface to the QCDgrid.
 Implement (a subset of) the SRM 2.x spec. on top of the Globus Java CoG to
allow QCDgrid to ‘speak the same language’ as the U.S. SRMs.
 Define an aggregation scheme for pooling SRM contents:
 ildgURLs:The ILDG file namespace.
 Build the ILDG Web Services
 Providing an interface based on SRM2.x.
 Extra information will be needed.
 Providing web services to browse the ILDG file namespace:
 Directly.
 Or via the metadata.
 Build a ILDG Web Portal
 Public access via www.lqcd.org.
 Hooking into the ILDG Web Services.
7
Andrew N. Jackson
May 2003
ildgURLs
 Whatever interface we use, we need a way to identify
files on the ILDG Grid-Of-Grids:
 Suggest using an ‘ildgURL’:
ildg://srm.host/dir/name.ext.
 ildg:// a unique prefix.
 srm.host the hostname of the SRM machine.
 For QCDgrid, the entire system appears as a single SRM.
 qcdgrid.ac.uk and backup mirror on qcdgrid-mirror.ac.uk?
 For U.S. machines, each store appears as a separate SRM.
 /dir/name.ext a unique filename of a file for that SRM.
 Unique per SRM, or over the entire ILDG?
 Note that this scheme assumes one SRM catalogue per machine.
 Under Globus, we could have many catalogues each with a set of collections of
files.
 But this is functionality is probably not needed.
8
Andrew N. Jackson
May 2003
The ILDG Services
 The ildgFileService
 Based on a (read-only?) subset of SRM2.x.
 Uses ildgURLs instead of LFNs.
 Contains extra information about the SRM hosts, e.g. the WS URL.
 Adds extra aggregation information.
 Later, as SRM matures, adopt more of the protocol (and indeed software!).
 The aggregate service browses the ildgURL-space.
 But a idlg://url can be mapped onto physical host and allowed access protocol(s),
which will then allow file downloads.
 gsiftp://trumpton.ph.ed.ac.uk/grid/dir/filename
 http://www.qcdgrid.ac.uk/ildgFileServlet?file=grid/dir/filename
 The ildgMetadataService
 Implement a Metadata-based interface.
 Devise a web-form and/or browse-tree interface?
 Extend QCDgrid’s Schema-based browser?
9
Andrew N. Jackson
May 2003
Testing A Grid-Of-Grids
 A Simple Grid-Of-Globuses
 A web interface to a set of Globus 2.x replica catalogues.
 (Could not get access to U.S. SRMs in the available time).
 Structure of the code is essentially as ILDG requires.
 Code Overview:
 org.lqcd.srm.* package defines an abstract version of a cut-down SRM.
 org.lqcd.srm.globus.* is an implementation of that SRM using the Java CoG.
 org.lqcd.ildg.ildgFileService defines the aggregated service. Currently accessed
via a Java Servlet Page instead of as a Web Service.
 Generally successful, with three main suggestions arising:
 We need a public access mechanism to get into the U.S. SRMs.
 Directory structure and file information should be cached.
 The demo caches in the browser session. The first visit is very very slow!
 It takes about 10 mins to do a full lookup!
 We need a generic metadata-based interface.
 Without it, the different catalogues can only be unified at the root of the
aggregated directory structure (by the definition of the ildgURL).
10
Andrew N. Jackson
May 2003
Metadata
 A proper SRM supplies lots of information about files.
 Globus Replica Catalogues only supply filenames
 No file size, ownership, creation date etc, and so this must be looked up on the
remote physical host and/or the metadata catalogue.
 Therefore, QCDgrid MUST hook its metadata catalogue into the SRM
Web Service API.
 Initially provide just basic file information
 But we may as well also plan how the generic metadata-browsing
might fit it to avoid making life difficult later.
 But, browse by metadata requires standardising access
to our metadata catalogues
 QCDgrid essentially maps XPath queries to logical filenames.
 But what should we be doing?
11
Andrew N. Jackson
May 2003
Plan
 Goal by Lattice 2003:
 simple system based on SRM 2.1.
 srmLs interface to browse each others files via the ildgURL
namespace.
 support at least enough of the SRM interface to allow simple file
transfer.
 To do this...
 First set up actual Web Services
 Move the test code from the service to a web services.
 (Currently, the test code queries the Globus RC directly).
 Polish the code:
 e.g. Needs a better caching mechanism.
 Currently once-per-session.
 Should cache and refresh in the background.
 Then, develop portals to be hosted on www.lqcd.org/ildg
 Browsing and transfer, looking rather like a normal http/ftp directory structure?
12
Andrew N. Jackson
May 2003
For Discussion...
 The Plan:
 Implement SRM2.x subset for QCDgrid: srmLs.
 Implement the ildgFileService and a simple web portal.
 The File Space Browser:
 Agree a ‘Namespace’ of the Grid-Of-Grids:
 ildg://srm.hostname/global/file/id ?
 Should we also support multiple collections at a single SRM?
 Should we ensure that the GlobalFileIDs are unique across the ILDG?
 Access Policies...
 Public general access?
 Administration Privileges & Actions?
 Use ‘Catalogue-Caching’ in the ildgFileService? Cache metadata too?
 Portal/User Interface ideas?
 Metadata API (mapping queries to ildgURLs):
 XPath using the Schema? Or ‘Attributes’ associated with the GlobalFileIDs?
13
Andrew N. Jackson
May 2003
Appendixes...
14
Andrew N. Jackson
May 2003
Global File IDs
 Use an ID to precisely identify a file.
 As well as a logical filename?!
 Quoted in the metadata catalogues – allowing
expression of relationships between files.
 But still permit changing global file names.
15
Andrew N. Jackson
May 2003
Policies
 Public (Read-Only) Access.
 We must define and implement our general access policy.
 Currently, QCDgrid allows public read-only access, but only to the replica catalogue.
 Then we can work out how to access U.S. SRMs.
 Completely Public?
 No registration required to read the public data?
 Public, with simple registration?
 Like NERSCs Gauge Connection, http://qcd.nersc.gov/.
 Certificates used to grant privileges?
 Certified Individuals Only?
 No anonymous access at all?
 Additional Authorisation.
 Grid Certificates used to authorise richer access/functionality.
 e.g. a certificate held in your browser can be used to authorise you to move data between
grids.
 ‘Local’ Administration.
 All potentially destructive acts restricted to the local administrators?
16
Andrew N. Jackson
May 2003
Download