A Bright Future with OGSA Data Services Malcolm Atkinson Director www.nesc.ac.uk 7th June 2004 OGSA-DAI Request to Registry for sources of data about “x” Registry responds with Factory handle Analyst SOAP/HTTP Registry GDSR service creation API interactions Request to Factory for access to database Factory returns handle of GDS to client Factory GDSF Factory creates GridDataService Client queries GDS with SQL, XPath, XQuery etc Query results returned XML OR delivered to consumer as XML Consumer Grid Data Service GDS Database (Xindice, MySQL Oracle, DB2) GDS interacts with database Extensibility Data resources Unbounded variety Data access languages Established standards X With many variants Should extensibility be supported by foundation interfaces? SQL, OQL, semi-structured query, domain languages Investment in DBs, DBMSs, File Stores, Bulk stores, … Not sensible to expect them to change to fit us Data Access Models must be extensible Static extension used extensively by OGSA-DAI users Move Computation to Data Increasingly Code scale Depends on wet-ware X necessary No noticeable rate of improvement Data scale Grows Moore’s Law or Moore’s Law2 Analysis of data Extracts & derivatives used X Often smaller – more value for current investigation Implies move code to data Application control or higher-level service decisions SQL, Xquery, Java code, … Extensibility mechanisms used by OGSA-DAIers Java mobility (e.g. DataCutter), database procedures, … Integration is Everything No business or research team is satisfied with one data resource Federation or Virtualisation Domain-specialist driven preceding Dynamic specification of combination function integration or Iterative processes – range of time scales kit of Sources inevitably heterogeneous integration Content, structure & policies time-varying tools to be Robust & stable steerable integration interwoven services Higher-level services over multiple resources with an Fundamental requirements for (re)negotiation application? Multiple tasks / request C L I E N T R E Q U E S T O R 1 Data Set dr Data Set A P Ident I S T Ident U Type Type B 7Value 6 Value 2 5 Ident Type Value 4 Ident Type Value 3 Ident Type Value 2 Ident Type Value 1 Ident Type Value Ident Type 0 Value Be Direct Breaks down Double Handling costs too much boundaries Memory cycles, bus capacity, cache disruption, … and merges Double Handling via discs pathologically data, bad execution & transport Data translation expensive requirements. Avoid or compose Main memory is not big enough Demands Couple generator & consumer directly smart workflow enactment Data pipe from RAM to RAM Requires coupled computation execution service & foundation services