XML Database Technologies for the Grid Rob Baxter Stephen Booth, Neil Chue Hong, Matt Egbert, Amy Krause, Andy Murdoch, Charaka Palansuriya, Kira Smyllie, Martin Westhead 1 Outline 4XML and the Grid - the perfect partnership 4OGSA Grid Services 4OGSA Data Access and Integration 4Grid XML Data Access Services – the DatabaseService interface 4Implementation using Xindice – XPath, XUpdate via Grid services 4Current work – DatabaseManagementService, DeliverySystem 4Plans 2 XML and the Grid 4XML is very important for the Grid – and becoming more so 4“But the Grid is about petabytes of binary data” – and you’re not going to mark up all that! – true… – but the Grid is also about metadata • scientific application metadata (e.g. simulation parameters) • technical metadata (e.g. the capabilities of the database I want to interact with) • binary metadata (what’s the format of my 4 TB binary file?) 4XML-based metadata key to Grid interactions 3 Sidebar: XML for binary metadata 4Related work at EPCC on binary metadata schema 4BinX v0.2 complete – it has a Java-based browser called JAJA… 4Provides an XML Schema for describing – – – – – 4 basic types: ieeeFloat, ieeeDouble, byte, short, int, long XDR types binaryData – a block of data of given length and single basic type array – an n-dimensional array of given type arrayMulti – an n-dimensional array which has been split across multiple files Accessing XML data on the Grid 4There are XML databases – and RDBs with XML capabilities 4There are XML access languages – XPath, XUpdate, XQuery 4And there’s the new model of the Grid – the Open Grid Services Architecture 4Our work is tying them together 5 OGSA Grid Services 4Open Grid Services Architecture – – – – – Announced GGF 4 Toronto Joint work between Globus and IBM Marriage of Globus and Web Services Work in progress… GGF working group ogsi-wg@gridforum.org 4Key features – – – – 6 Dynamic service creation and destruction Lifetime management and soft state Draws together academic Grid with commercial Web Backing from major players (IBM, Microsoft, Oracle…) OGSA-DAI 4Data access and integration services for OGSA – Protocols for accessing databases over OGSA • XML; Relational/SQL; OO; other semi-structured data sources – Transparent integration of multiple databases • Running queries over multiple dbs automatically 4Ideas gestated by UK eScience DBTF 4Development through GCP project – – – – 7 EPCC through NeSC eSNW, NEReSC IBM, Oracle Funding from DTI, EPSRC, IBM UK, IBM US, Oracle First stages 4OGSA-DAI prototyping – – – – – Feb - Jul 2002 Building systems from draft specs Discovering what works And what doesn’t :-) Informal requirements discussion with UK Griddies • Formalise for later phases 4NeSC focus on architecture (WP2) and XML data services (WP3) 4IBM focus on RDBMS data services (WP6) 8 GXDS prototyping 4Baseline Grid Services spec – Tuecke et al, Feb 2002 4Baseline Grid Data Services spec – Atkinson et al, Feb 2002 4“Default implementation platform” – SOAP/HTTP binding for WSDL protocol – Apache Axis in Tomcat for hosting environment 4Apache Xindice database – Supports XPath, XUpdate as query & update languages 4Simple GUI client 9 Here’s a picture axis/tomcat GXDS stub Xpath/Xupdate GXDS SOAP Browser / Client 10 GXDS skeleton Xindice XMLDB GXDS details 4A Grid XML Data Service requires – GridService PortType (Tuecke et al) – NotificationSource PortType (Tuecke et al) – DatabaseService PortType (us) • http://schemas.nesc.ac.uk/gridServices/GDS/GXDS 4DatabaseService currently defines – – – – 11 query update bulkLoad schemaUpdate What does a GXDS do? 4Users can perform standard database interactions – – – – 12 Query the database Load XML documents into the database Update records in the database Update the database record schema Record schema? 4The database has a “record-schema” which all of the XML documents in the database must adhere to – Submitted and updated documents must be checked against the record schema 4The record-schema is changeable to allow the database to evolve – When the record schema is changed, all documents in the database are validated against the record-schema to see if the new schema is valid • If any documents fail validation, the new-schema is rejected • In following these simple rules, the database remains backwards compatible 13 DatabaseService::query 4Inputs – queryNotation • A query notation supported by the db (e.g. XPath) – query • A query conforming to queryNotation – values (optional) • Any query parameters – txHandle • A transaction handle (for later work) – expires (optional) • Time until which results may be claimed – resultHandle (optional) • The GSH of a DeliverySystem 14 DatabaseService::query 4Outputs – status • Boolean success or failure – result • Either the results as a string OR • A GSH of a DeliverySystem 15 DeliverySystem? 4Third-party service for data transfer 4Service requesters can use DSes as proxies – For long-lived queries – For large data transfers 4DSes can implement high-performance data transfer – e.g. GridFTP instead of SOAP/HTTP 16 GXDS aims to be generic 4Note that XML specifics in, e.g., queryNotation – Protocol should be generic – Database and query specifics only come in at endpoints 4This way, GXDS prototype = GDS prototype – At least at the protocol level 4GXDS ideas + IBM’s GRDS ideas will help explore the architecture of GDS (WP2) 17 GXDS prototype 4Demo completed so far – WSDL spec for GXDS vn. alpha-1.0 – Java implementation in axis/tomcat environment • http://garnet.epcc.ed.ac.uk:8080/axis/services/DatabaseService – – – – – DatabaseService::query DatabaseService::update DatabaseService::bulkLoad DatabaseService::schemaUpdate To be released soon to early adopters and interested parties 4Let’s look at some implementation details 18 Summary of technologies 4Xindice: Apache’s XML Database 4Axis: Apache’s SOAP/Web Service server 4Tomcat: Apache’s servlet hosting engine 4Grid Service: An enhanced web-service tomcat SOAP request 19 gxds skeleton Xindice API axis Xindice db Implementation 4The great thing about using Xindice is that almost all of the functionality we needed was supported directly by the Xindice API – Adding documents to the Database – Querying the database using XPath – Updating the database using XUpdate 4All we had to do was wrap up Xindice API calls in Tomcat 4Let’s take a closer look at our use of the Xindice API 20 Initialising the Xindice API 4These are the few lines of initialisation code which are necessary for interactions with a Xindice Database String driver = "org.apache.xindice.client.xmldb.DatabaseImpl"; Class c = Class.forName(driver); Database database = (Database) c.newInstance(); DatabaseManager.registerDatabase(database); col = DatabaseManager.getCollection("xmldb:xindice:///db/addressbook"); Specifies which database and which collection… 21 Adding documents 4Adding documents to the database is very simple: String values = new String(); values = “<eg_xml> This will be added </eg_xml>”; XMLResource add_me = col.createResource(null,"XMLResource"); add_me.setContent(values); col.storeResource(add_me); 22 Querying using XPath String xpath = new String(); xpath = "//person/name/*"; XPathQueryService xp_service = col.getService("XPathQueryService", "1.0"); ResourceSet resultSet = xp_service.query(xpath); 23 Updating using XUpdate String update=new String(); update = “XUPDATE STRING GOES HERE”; XUpdateQueryService xupdate_service = data.getService("XUpdateQueryService","1.0"); xupdate_service.update(update); 24 The Exceptions 4All interactions with a Xindice database can throw XMLDBExceptions try { //any Xindice interactions go here } catch(XMLDBException e) { DatabaseAccessException f= new DatabaseAccessException(e.getMessage()); successful_update=false; throw(f); } 25 Current work 4Grid XML DatabaseManagementService – Supports GS Factory PortType – Creates GXDSes upon request • To service a given user connection to the db • To run a particular, long-lived query and then close – Initial prototype will look at creating GXDSes within same hosting environment (i.e. tomcat) 4Grid XML DeliverySystem – “Run my query, but don’t send the results back to me, send them to this guy” – Look initially at simple SOAP message forwarding – Later DeliverySystems will support GridFTP 26 Medium term 4Formal review Aug-Sep (post GGF5) – Refine GDS ideas with GXDS and GRDS discoveries 4Seek detailed feedback on strawmen from UK e-Science – AstroGrid, MyGrid, GridPP, RealityGrid, eDIKT 4Develop revised GDS specs 4Develop reference implementations – XMLDBs, RDBs, semi-structured file data (?) – Java/SOAP (the default) – Enterprise Java Beans, Microsoft .NET framework (?) 27 Summary 4XML will be the metadata language for the Grid 4XML databases becoming increasingly important for complex metadata 4OGSA-DAI up and running at EPCC & IBM 4Early Grid XML Data Services spec defined 4Early demonstration prototype working 4Harder (but more interesting!) bits ongoing 4Formal reference implementations to start Q4 28