Grid@epcc Joining the dots Dr Chris Maynard EPCC c.maynard@ed.ac.uk +44 131 650 5077 Introduction The ideas of grid computing are everywhere even if the actual grids are not as pervasive The are many grid middleware packages with overlapping functionality No universal solution Each project requires some glue to tie components together 22/02/2009 Edinburgh - Tsukuba Workshop 2 Outline • Three example projects at EPCC • OGSA-DAI • BEinGRID • ILDG 22/02/2009 Edinburgh - Tsukuba Workshop 3 Challenges • Diversity – Data resource types, vendors, middleware, schema, meta data • Scale – Collections, formats, volumes, geographical, political and social distance • Ownership – On individual, group, and organisational levels • Security – Client, service and data owners 22/02/2009 Edinburgh - Tsukuba Workshop 4 Sharing data • Convert data into information • Reveal new insights • – Scientific knowledge – Business advantage Data mining across distributed data resources – Exploit public and private data • Open or closed communities – Scientific collaborations – Business partnerships 22/02/2009 Edinburgh - Tsukuba Workshop 5 OGSA-DAI • OGSA-DAI – 02/2002 – 07/2003 – EPCC, NeSC, IBM, Oracle, NEReSC, eSNW – DTI/EPSRC via UK e-Science Grid Core Programme • DAIT (DAI-Two) – 10/2003 – 10/2005 – EPCC, NeSC, IBM, NEReSC, eSNW – DTI/EPSRC via UK e-Science Grid Core Programme 2 as part of the OMIIUK project • OMII-UK – 11/2005 – 04/2009 – EPCC, NeSC – EPSRC • OMII-UK extension – 04/2009 – 04/2010 – EPCC, NeSC – EPSRC 22/02/2009 Edinburgh - Tsukuba Workshop 6 Workflows Target data resource Activity Convert query from French to English Country Capital UK London France Paris Run SQL query SELECT Country, Capital FROM Countries Activity output Activity input Grande-Bretagne Londres France Paris Join the data SELECT País, Capital FROM Países Convert data from Spanish to French Run SQL query País Capital España Madrid Italia 22/02/2009 Capital Convert data from English to French SELECT Pays,Capital FROM Pays Convert query from French to Spanish Pays Pays Capital Grande-Bretagne Londres France Paris l'Espagne Madrid l'Italie Rome Pays Capital l'Espagne Madrid l'Italie Rome Roma Edinburgh - Tsukuba Workshop 7 ADMIRE • Advanced Data Mining and Integration Research for Europe – EU 7th Framework program project – EPCC, NeSC and European partners • Infrastructure for data integration and mining – Large scale enterprise systems • Applications – Flood modelling and simulations – Customer relationship management 22/02/2009 Edinburgh - Tsukuba Workshop 8 GEOGrid • Global Earth Organisation (GEO) Grid – National Institute of Advanced Industrial Science and Technology, Japan • Geo-spatial data and services – – – – Disaster mitigation Environmental monitoring Natural resource exploration Virtual integration and access control • Data – Satellite imagery – Geological data – Ground-sensed data 22/02/2009 Edinburgh - Tsukuba Workshop 9 SEE-GEO – geo-linking portal 1: GLSQuery submited via portal e.g. “Leeds population distribution by census output area” GLS Portal Maps 5: Portal gets image using URL 4: URL of image is returned to portal – avoids costly SOAP/HTTP transfer of image MIMAS Census OGSA-DAI Get Join Transform Get UK BORDERS 22/02/2009 2: Workflow is populated with query parameters and run Deliver 3: Image is placed on a map server Image Creation Service Edinburgh - Tsukuba Workshop 10 BEinGRID • • • • Type of project: Integrated Project Project coordinator: ATOS ORIGIN Project start date*: 1st June 2006 Duration: 42 months • Max EC contribution: 15.7 M euros • Consortium: 99 partners http://www.beingrid.eu/ http://www.it-tude.com/ 22/02/2009 Edinburgh - Tsukuba Workshop 11 BEinGRID Vision • Typical Technology Transfer project: – 2 waves of 18+7 Business Experiments involving: – SMEs in various industry-sections – Technical and Business experts – Set up a repository of Grid solutions, available free/at cost to the respective sectors – Prove that businesses will benefit from the adoption of Grid technologies 22/02/2009 Edinburgh - Tsukuba Workshop 12 BE02 – FilmGrid • “Movie post-production workflow” • Reviewing data flow in the industry – Current data movement tied into celluloid shooting – What is the effect of digital capture? – How useful is Sohonet other than for email? • The FilmGrid prototype proves: – Grid technology is highly appropriate for movie post-production – Potentially large gains in: – Efficiency – Reliability – Accountability – Accessibility • http://tinyurl.com/filmgrid 22/02/2009 Edinburgh - Tsukuba Workshop 13 Asset Manager Local Files 22/02/2009 Transfer Status Edinburgh - Tsukuba Workshop Global Assets 14 Database Triggers • Procedure to be executed when a modification is made to a table – INSERT, UPDATE or DELETE • Various use cases – Log changes – Execute business rules (e.g. email a manager when online orders push stock levels below a specified threshold ) – Enforce business rules (e.g. all invoices must be associated with a valid customer) • How to set-up a trigger is dependent on DB implementation 22/02/2009 Edinburgh - Tsukuba Workshop 15 OGSA-DAI Trigger • Uses database triggers to call an OGSA-DAI workflow upon modification to a database • Extends single-database trigger functionality to: – Span several, heterogeneous databases – Execute powerful OGSA-DAI workflows • Many possible use cases – Synchronising databases – Logging to an external database – Ensuring or executing business logic across partners http://tinyurl.com/ogsadaitrigger 22/02/2009 Edinburgh - Tsukuba Workshop 16 BE24 – GRID2(B2B) • “Grid technologies for affordable data synchronization and SME integration within B2B networks” • Empowering existing B2B networks by electronically connecting suppliers at an affordable price – Webservices-based add-on to allow data exchange at database level – Uses OGSA-DAI Trigger to automate synchronization • The GRID2(B2B) prototype demonstrates: – Easy integration with multiple B2B platforms – User in total control of what data is sent – Automated synchronization: • Fast and frequent data transfer • Remove the need to enter data twice • http://tinyurl.com/grid2b2b 22/02/2009 Edinburgh - Tsukuba Workshop 17 How does it work? Data Service communicates the new information to the Data Ducati - Starter Federation Agent New orders generated by Ducati software MaNeM – B2B Platform Data Service and Data Federation Agent are configured using the GRID2(B2B) Configurator Bentivogli - Partner GRID2(B2B) Data Federation Agent DBMS Orders written to an internal database OGSA-DAI Trigger used to monitor for new data DBMS DBMS GRID2(B2B) GRID2(B2B) Data Service Data Service Data Federation Agent inserts information into B2B database. 22/02/2009 Data Federation Agent also monitors for new data in the B2B platform and propagates it on to the correct member of the network Edinburgh - Tsukuba Workshop 18 International Lattice Data Grid • Sharing Lattice QCD data • ILDG has no formal role – groups collaborate informally – working groups for metadata and middleware • Individual groups were already starting to build data grid infrastructures – – – – – UKQCD – QCDgrid, later DiGS German groups combined into LATFOR, grid arm is LDG US groups formed USQCD Japanese – JLDG Australia – Web portal • Middleware often dictated by national considerations – ILDG is an aggregation of existing grids – Interoperable 22/02/2009 Edinburgh - Tsukuba Workshop 19 ILDG WG • Edinburgh and Tsukuba personnel • Metadata Working Group – Tomoteru Yoshie Previous Convener – Chris Maynard Current Convener • Middleware Working Group – George Beckett, Daragh Byrne, Eilidh Grant, Radek Ostrowski, and James Perry – Mitsuhisa Sato, Toshiyuki Amagassa, Osamu Tatebe • Example of Tsukuba and Edinburgh active collaboration 22/02/2009 Edinburgh - Tsukuba Workshop 20 Three requisite conditions • Trust – already established in the community – known community • Altruism – political will to make data available – effort to build infrastructure – effort actually making data available • Reward – how to credit those making data available – data users should cite a designated paper 22/02/2009 Edinburgh - Tsukuba Workshop 21 Three ideas to make this work • Standard data format – Doesn’t really matter what, as long as one can read and write – configurations: SciDAC LIME record is 3x3 NERSC data layout • Standard metadata – Semantic description of the data – Can be processed by an application • Standard interfaces to services – Queries to metadata catalogues (MDC) – Queries to File Catalogue Web services (FC) – Authentication and authorisation 22/02/2009 Edinburgh - Tsukuba Workshop 22 Architecture 22/02/2009 Edinburgh - Tsukuba Workshop 23 Summary • Rise in data complexity – doing things by hand is no longer scalable – we need tools to automate logistics and glue systems and data together • Grid architecture sits on top of existing systems – can access remote data with local tools – Many different middleware stacks – Effort required to ensure interoperability • Tsukuba and Edinburgh already collaborated successfully on ILDG 22/02/2009 Edinburgh - Tsukuba Workshop 24 Lunch 22/02/2009 Edinburgh - Tsukuba Workshop 25