XML Database Technologies for the Grid Rob Baxter

advertisement
XML Database Technologies for the Grid
Rob Baxter
Stephen Booth, Neil Chue Hong, Matt Egbert,
Amy Krause, Andy Murdoch,
Charaka Palansuriya, Kira Smyllie,
Martin Westhead
1
Outline
4XML and the Grid - the perfect partnership
4OGSA Grid Services
4OGSA Data Access and Integration
4Grid XML Data Access Services
– the DatabaseService interface
4Implementation using Xindice
– XPath, XUpdate via Grid services
4Current work
– DatabaseManagementService, DeliverySystem
4Plans
2
XML and the Grid
4XML is very important for the Grid
– and becoming more so
4“But the Grid is about petabytes of binary data”
– and you’re not going to mark up all that!
– true…
– but the Grid is also about metadata
• scientific application metadata (e.g. simulation parameters)
• technical metadata (e.g. the capabilities of the database I want to
interact with)
• binary metadata (what’s the format of my 4 TB binary file?)
4XML-based metadata key to Grid interactions
3
Sidebar: XML for binary metadata
4Related work at EPCC on binary metadata
schema
4BinX v0.2 complete
– it has a Java-based browser called JAJA…
4Provides an XML Schema for describing
–
–
–
–
–
4
basic types: ieeeFloat, ieeeDouble, byte, short, int, long
XDR types
binaryData – a block of data of given length and single basic type
array – an n-dimensional array of given type
arrayMulti – an n-dimensional array which has been split across
multiple files
Accessing XML data on the Grid
4There are XML databases
– and RDBs with XML capabilities
4There are XML access languages
– XPath, XUpdate, XQuery
4And there’s the new model of the Grid
– the Open Grid Services Architecture
4Our work is tying them together
5
OGSA Grid Services
4Open Grid Services Architecture
–
–
–
–
–
Announced GGF 4 Toronto
Joint work between Globus and IBM
Marriage of Globus and Web Services
Work in progress…
GGF working group ogsi-wg@gridforum.org
4Key features
–
–
–
–
6
Dynamic service creation and destruction
Lifetime management and soft state
Draws together academic Grid with commercial Web
Backing from major players (IBM, Microsoft, Oracle…)
OGSA-DAI
4Data access and integration services for OGSA
– Protocols for accessing databases over OGSA
• XML; Relational/SQL; OO; other semi-structured data sources
– Transparent integration of multiple databases
• Running queries over multiple dbs automatically
4Ideas gestated by UK eScience DBTF
4Development through GCP project
–
–
–
–
7
EPCC through NeSC
eSNW, NEReSC
IBM, Oracle
Funding from DTI, EPSRC, IBM UK, IBM US, Oracle
First stages
4OGSA-DAI prototyping
–
–
–
–
–
Feb - Jul 2002
Building systems from draft specs
Discovering what works
And what doesn’t :-)
Informal requirements discussion with UK Griddies
• Formalise for later phases
4NeSC focus on architecture (WP2) and XML
data services (WP3)
4IBM focus on RDBMS data services (WP6)
8
GXDS prototyping
4Baseline Grid Services spec
– Tuecke et al, Feb 2002
4Baseline Grid Data Services spec
– Atkinson et al, Feb 2002
4“Default implementation platform”
– SOAP/HTTP binding for WSDL protocol
– Apache Axis in Tomcat for hosting environment
4Apache Xindice database
– Supports XPath, XUpdate as query & update languages
4Simple GUI client
9
Here’s a picture
axis/tomcat
GXDS
stub
Xpath/Xupdate
GXDS
SOAP
Browser /
Client
10
GXDS
skeleton
Xindice
XMLDB
GXDS details
4A Grid XML Data Service requires
– GridService PortType (Tuecke et al)
– NotificationSource PortType (Tuecke et al)
– DatabaseService PortType (us)
• http://schemas.nesc.ac.uk/gridServices/GDS/GXDS
4DatabaseService currently defines
–
–
–
–
11
query
update
bulkLoad
schemaUpdate
What does a GXDS do?
4Users can perform standard database
interactions
–
–
–
–
12
Query the database
Load XML documents into the database
Update records in the database
Update the database record schema
Record schema?
4The database has a “record-schema” which all of
the XML documents in the database must adhere
to
– Submitted and updated documents must be checked against the
record schema
4The record-schema is changeable to allow the
database to evolve
– When the record schema is changed, all documents in the
database are validated against the record-schema to see if the
new schema is valid
• If any documents fail validation, the new-schema is rejected
• In following these simple rules, the database remains backwards
compatible
13
DatabaseService::query
4Inputs
– queryNotation
• A query notation supported by the db (e.g. XPath)
– query
• A query conforming to queryNotation
– values (optional)
• Any query parameters
– txHandle
• A transaction handle (for later work)
– expires (optional)
• Time until which results may be claimed
– resultHandle (optional)
• The GSH of a DeliverySystem
14
DatabaseService::query
4Outputs
– status
• Boolean success or failure
– result
• Either the results as a string OR
• A GSH of a DeliverySystem
15
DeliverySystem?
4Third-party service for data transfer
4Service requesters can use DSes as proxies
– For long-lived queries
– For large data transfers
4DSes can implement high-performance data
transfer
– e.g. GridFTP instead of SOAP/HTTP
16
GXDS aims to be generic
4Note that XML specifics in, e.g., queryNotation
– Protocol should be generic
– Database and query specifics only come in at endpoints
4This way, GXDS prototype = GDS prototype
– At least at the protocol level
4GXDS ideas + IBM’s GRDS ideas will help
explore the architecture of GDS (WP2)
17
GXDS prototype
4Demo completed so far
– WSDL spec for GXDS vn. alpha-1.0
– Java implementation in axis/tomcat environment
• http://garnet.epcc.ed.ac.uk:8080/axis/services/DatabaseService
–
–
–
–
–
DatabaseService::query
DatabaseService::update
DatabaseService::bulkLoad
DatabaseService::schemaUpdate
To be released soon to early adopters and interested parties
4Let’s look at some implementation details
18
Summary of technologies
4Xindice: Apache’s XML Database
4Axis: Apache’s SOAP/Web Service server
4Tomcat: Apache’s servlet hosting engine
4Grid Service: An enhanced web-service
tomcat
SOAP request
19
gxds
skeleton
Xindice API
axis
Xindice
db
Implementation
4The great thing about using Xindice is that
almost all of the functionality we needed was
supported directly by the Xindice API
– Adding documents to the Database
– Querying the database using XPath
– Updating the database using XUpdate
4All we had to do was wrap up Xindice API
calls in Tomcat
4Let’s take a closer look at our use of the
Xindice API
20
Initialising the Xindice API
4These are the few lines of initialisation code
which are necessary for interactions with a
Xindice Database
String driver = "org.apache.xindice.client.xmldb.DatabaseImpl";
Class c = Class.forName(driver);
Database database = (Database) c.newInstance();
DatabaseManager.registerDatabase(database);
col = DatabaseManager.getCollection("xmldb:xindice:///db/addressbook");
Specifies which database and
which collection…
21
Adding documents
4Adding documents to the database is very
simple:
String values = new String();
values = “<eg_xml> This will be added </eg_xml>”;
XMLResource add_me =
col.createResource(null,"XMLResource");
add_me.setContent(values);
col.storeResource(add_me);
22
Querying using XPath
String xpath = new String();
xpath = "//person/name/*";
XPathQueryService xp_service =
col.getService("XPathQueryService", "1.0");
ResourceSet resultSet =
xp_service.query(xpath);
23
Updating using XUpdate
String update=new String();
update = “XUPDATE STRING GOES HERE”;
XUpdateQueryService xupdate_service =
data.getService("XUpdateQueryService","1.0");
xupdate_service.update(update);
24
The Exceptions
4All interactions with a Xindice database can throw
XMLDBExceptions
try {
//any Xindice interactions go here
}
catch(XMLDBException e) {
DatabaseAccessException f= new
DatabaseAccessException(e.getMessage());
successful_update=false;
throw(f);
}
25
Current work
4Grid XML DatabaseManagementService
– Supports GS Factory PortType
– Creates GXDSes upon request
• To service a given user connection to the db
• To run a particular, long-lived query and then close
– Initial prototype will look at creating GXDSes within same
hosting environment (i.e. tomcat)
4Grid XML DeliverySystem
– “Run my query, but don’t send the results back to me, send
them to this guy”
– Look initially at simple SOAP message forwarding
– Later DeliverySystems will support GridFTP
26
Medium term
4Formal review Aug-Sep (post GGF5)
– Refine GDS ideas with GXDS and GRDS discoveries
4Seek detailed feedback on strawmen from UK
e-Science
– AstroGrid, MyGrid, GridPP, RealityGrid, eDIKT
4Develop revised GDS specs
4Develop reference implementations
– XMLDBs, RDBs, semi-structured file data (?)
– Java/SOAP (the default)
– Enterprise Java Beans, Microsoft .NET framework (?)
27
Summary
4XML will be the metadata language for the Grid
4XML databases becoming increasingly
important for complex metadata
4OGSA-DAI up and running at EPCC & IBM
4Early Grid XML Data Services spec defined
4Early demonstration prototype working
4Harder (but more interesting!) bits ongoing
4Formal reference implementations to start Q4
28
Download