XML for Data Grid Applications Chip Watson Thomas Jefferson National Accelerator Facility

advertisement
XML for Data Grid
Applications
Chip Watson
Thomas Jefferson National Accelerator Facility
May 31, 2016
PPDG Meeting
1
Why XML?
-- Industry Trends
Strategy: Use web technologies, follow the success of the web...
E-commerce companies (especially B2B) are currently investing heavily
in XML technologies...
Example news items:
[December 11, 2000] "iPlanet Unveils Industry's First Full-Up B2B
Commerce Platform…[based upon XML]”
[December 08, 2000] "Schemantix (formerly Praxis) to Launch
Schemantix Development Platform (SxDP) at XML 2000.’’
“Microsoft is augmenting its OLE DB for OLAP protocol with new
interfaces based on XML…`The brass tacks on this is we're all going
to run our analytical apps over the Internet, and the language these
apps will use to communicate with their data sources will be XML,’
says Clay Young, VP of marketing at online analytical processing
software vendor Knosys Inc.” -- InformationWeek, Dec 7, 2000
May 31, 2016
PPDG Meeting
2
What is XML ?
eXtensible Markup Language
– Like HTML, but with user defined tags
– Tags refer to content, not presentation:
<?xml version='1.0' encoding='ISO-8859-1'?>
Properties of node
<directory name="/clas" owner="root" group="other" modified="Aug 22 08:34">
<file name='97-12'/>
<file name='98-02'/>
Node contents
<file name='98-03'/>
<directory name='comm97'/>
<directory name='e1'/>
</directory>
May 31, 2016
XML has a tree
data model
PPDG Meeting
3
XML vs CORBA
• XML is more verbose
– data transported as character strings (~2x for float)
– data is self describing, with string tags (~2x)
(however, lists are separated by single whitespace, so
string lists are carried with little overhead)
• CORBA is harder to deploy
– requires ORB, complex libraries, name server, etc.
• Both are language neutral
– XML supported in C/C++, Java, Perl, etc.
May 31, 2016
PPDG Meeting
4
What about SOAP ?
Simple Object Access Protocol
SOAP is a protocol specification for invoking methods on
servers, services, components and objects (RPC system).
SOAP codifies the existing practice of using XML and
HTTP as a method invocation mechanism.
The SOAP specification mandates a small number of
HTTP headers that facilitate firewall/proxy filtering.
The SOAP specification also mandates an XML
vocabulary that is used for representing method
parameters, return values, and exceptions.
May 31, 2016
PPDG Meeting
5
Simple POST vs SOAP
• Simple POST
– query contains tagged string values, like
http://xxx.yyy.zzz/page?name=xyzzy&owner=watson
• SOAP
– query contains structured arguments, even user defined
types (example to follow)
In either case, response is an http response of type xml, with
arbitrary (tree-like) structure
May 31, 2016
PPDG Meeting
6
SOAP structure example
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<SOAP-ENV:Body>
<ppdg:AddFile xmlns:ppdg=”http://schemas.ppdg.org/soap/xmlns.ppdg">
<directory>/clas/90-03/</directory>
<file>test7.dat
<owner name=“watson”/>
<activity name=“calibration”/>
</file>
</ppdg:AddFile>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
May 31, 2016
PPDG Meeting
7
Analysis: Simple vs SOAP
• ReplicaCatalog & ReplicaHost (OO api)
– need to send method name & [0-2] string args
• Future catalog queries
– may need to send many selection criteria, but this could be done as a
simple query string (hence 1 argument)
– question: may want to “batch” requests, sending, for example, an
array of file names to resolve ?
[could be done as many single calls, and let TCP buffer]
• Conclusion: Requirements do NOT dictate SOAP
– May still choose SOAP for standardization reasons…although the
proposer does not have a good track record here
May 31, 2016
PPDG Meeting
8
Prototyping XML at Jlab
Goals:
• Get experience w/ XML
• Get experience w/ using XML in servlets
• Demonstrate feasibility of using XML as web protocol for
ReplicaCatalog and ReplicaHost
• Deploy prototype replica system for experimental physics
data stored in Jlab silo
– currently OSM + custom java infrastructure
– plan to replace OSM, resulting in pure java infrastructure
May 31, 2016
PPDG Meeting
9
XML & HTML
sql db
ldap db
XML
servlet
xml client
HTML
servlet
html client
corba obj
style sheet
Two types of servlets used, one generating xml, another
which calls the first, and uses a library (few calls) to apply
a style sheet to the xml and generate html
May 31, 2016
PPDG Meeting
10
Prototype Components
• ReplicaCatalog
– java servlet producing XML
– xsl style sheet to translate this to html for browsers
– servlet to do formating (via style sheet)
• ReplicaHost
– java servlet producing XML
– xsl style sheet to translate this to html for browsers
– servlet to do formating (via style sheet)
• Simple file transfer servers
– currently bbftpd, but soon httpd, gsiftpd
May 31, 2016
PPDG Meeting
11
Replica Catalog
• Implemented as Java servlet (Apache + Tomcat)
– currently uses fork rsh ls /mss … to get listing of silo
contents for demo purposes
– will use mysql via jdbc for persistent store (very soon)
– supports tree data model (maps existing silo system)
• Produces XML output
for directory:
• listing of one directory, contents are files + subdirectories
• includes properties of this directory (owner, etc.)
for file:
• properties of the file (owner, etc.)
• ReplicaHost(s) holding the file
May 31, 2016
PPDG Meeting
12
Replica Host
• Gives access information (disk-resident, offline, etc.)
• If disk resident, locally translates file name (virtual path) to
URL(s), indicating supported protocols, such as
http://xxx.jlab.org/diskcache9/clas/file7.dat
bbftp://bbftp.jlab.org/diskcache9/clas/file7.dat
gsiftp://xxx.jlab.org/diskcache9/clas/file7.dat
• Future (within 1-2 months):
–
–
–
–
–
support request to stage to disk
support request to “pin” a file (advisory only)
support request to store a file (push and/or pull?)
manage update to catalog in response to local deletions of files
web pages to fetch any file via browser
May 31, 2016
PPDG Meeting
13
Demo
• xml test of ReplicaCatalog viewed as xml
• processed with style sheet & viewed as html
May 31, 2016
PPDG Meeting
14
Note: Directory Model Changed
Recommendation:
– Change the catalog data model to allow file system
(tree) symantics in the logical name space.
– Hierarchical (apparently) containers
– Actual containers may still be flat:
/a/b/c is one container
/a/b/c/d/e is a separate container
/a/b/c appears to contain “d” (even if not implemented
that way in storage)
This will probably be more attractive to physicists
and other users.
May 31, 2016
PPDG Meeting
15
Future Activities
1. Finish SQL database for ReplicaCatalog
2. Finish integration of ReplicaHost and Jlab silo
3. Create exportable package for ReplicaHost
– Disk cache manager (java based)
• mountable by local clients
– ReplicaHost (java servlet based)
– File transfer daemons
•
•
•
•
May 31, 2016
http
bbftp
gsiftp
gridftp
PPDG Meeting
16
PPDG Sub-project (1)
Protocol standardization
– choice of simple or SOAP
– standardization of method names and / or arguments
for requests
– XML tag name standardization
– response standardization (e.g. one directory listing)
May 31, 2016
PPDG Meeting
17
PPDG Sub-project (2)
1. Shared ReplicaCatalog servlet implementation
– standardize java interface to local persistent store
– implement reference implementations:
1. above LDAP (compatible w/ or extending Globus solution)
2. above JDBC (Jlab design, open to revisions of schema)
2. Shared ReplicaHost servlet implementation
– standardize java interface to local silo, disk managers
– implement reference implementations:
1. CORBA calls to SRB
2. RMI calls to Jlab disk & silo managers
3. other?
May 31, 2016
PPDG Meeting
18
PPDG Sub-project (2)
3. C/C++ and Java client libraries
– for Java & C++, implementing an OO api with local
browsing of xml data
4. Extend ReplicaHost to support queueing of
transfer requests...
...to/from other ReplicaHosts
– negotiate transfer protocol with other host
– negotiate push/pull with other host
...to/from remote transfer daemon
– protocol and direction fixed
May 31, 2016
PPDG Meeting
19
Download