Grid@epcc Joining the dots Dr Chris Maynard EPCC

advertisement
Grid@epcc
Joining the dots
Dr Chris Maynard
EPCC
c.maynard@ed.ac.uk
+44 131 650 5077
Introduction
The ideas of grid computing are everywhere
even if the actual grids are not as pervasive
The are many grid middleware packages
with overlapping functionality
No universal solution
Each project requires some glue to tie
components together
22/02/2009
Edinburgh - Tsukuba Workshop
2
Outline
• Three example projects at EPCC
• OGSA-DAI
• BEinGRID
• ILDG
22/02/2009
Edinburgh - Tsukuba Workshop
3
Challenges
• Diversity
– Data resource types, vendors, middleware, schema, meta
data
• Scale
– Collections, formats, volumes, geographical, political and
social distance
• Ownership
– On individual, group, and organisational levels
• Security
– Client, service and data owners
22/02/2009
Edinburgh - Tsukuba Workshop
4
Sharing data
• Convert data into information
• Reveal new insights
•
– Scientific knowledge
– Business advantage
Data mining across distributed data resources
– Exploit public and private data
• Open or closed communities
– Scientific collaborations
– Business partnerships
22/02/2009
Edinburgh - Tsukuba Workshop
5
OGSA-DAI
• OGSA-DAI – 02/2002 – 07/2003
– EPCC, NeSC, IBM, Oracle, NEReSC, eSNW
– DTI/EPSRC via UK e-Science Grid Core Programme
• DAIT (DAI-Two) – 10/2003 – 10/2005
– EPCC, NeSC, IBM, NEReSC, eSNW
– DTI/EPSRC via UK e-Science Grid Core Programme 2 as part of the OMIIUK project
• OMII-UK – 11/2005 – 04/2009
– EPCC, NeSC
– EPSRC
• OMII-UK extension – 04/2009 – 04/2010
– EPCC, NeSC
– EPSRC
22/02/2009
Edinburgh - Tsukuba Workshop
6
Workflows
Target data
resource
Activity
Convert query
from French to
English
Country
Capital
UK
London
France
Paris
Run SQL query
SELECT Country, Capital
FROM Countries
Activity output
Activity input
Grande-Bretagne
Londres
France
Paris
Join
the
data
SELECT País, Capital
FROM Países
Convert
data from
Spanish to
French
Run SQL query
País
Capital
España
Madrid
Italia
22/02/2009
Capital
Convert
data from
English to
French
SELECT Pays,Capital
FROM Pays
Convert query
from French to
Spanish
Pays
Pays
Capital
Grande-Bretagne
Londres
France
Paris
l'Espagne
Madrid
l'Italie
Rome
Pays
Capital
l'Espagne
Madrid
l'Italie
Rome
Roma
Edinburgh - Tsukuba Workshop
7
ADMIRE
• Advanced Data Mining and Integration Research for
Europe
– EU 7th Framework program project
– EPCC, NeSC and European partners
• Infrastructure for data integration and mining
– Large scale enterprise systems
• Applications
– Flood modelling and simulations
– Customer relationship management
22/02/2009
Edinburgh - Tsukuba Workshop
8
GEOGrid
• Global Earth Organisation (GEO) Grid
– National Institute of Advanced Industrial Science and Technology,
Japan
• Geo-spatial data and services
–
–
–
–
Disaster mitigation
Environmental monitoring
Natural resource exploration
Virtual integration and access control
• Data
– Satellite imagery
– Geological data
– Ground-sensed data
22/02/2009
Edinburgh - Tsukuba Workshop
9
SEE-GEO – geo-linking portal
1: GLSQuery
submited via
portal e.g. “Leeds
population
distribution by
census output
area”
GLS
Portal
Maps
5: Portal gets image using URL
4: URL of image is returned to portal – avoids
costly SOAP/HTTP transfer of image
MIMAS
Census
OGSA-DAI
Get
Join
Transform
Get
UK
BORDERS
22/02/2009
2: Workflow is populated with
query parameters and run
Deliver
3: Image
is placed
on a map
server
Image
Creation
Service
Edinburgh - Tsukuba Workshop
10
BEinGRID
•
•
•
•
Type of project: Integrated Project
Project coordinator: ATOS ORIGIN
Project start date*: 1st June 2006
Duration: 42 months
• Max EC contribution: 15.7 M euros
• Consortium: 99 partners
http://www.beingrid.eu/
http://www.it-tude.com/
22/02/2009
Edinburgh - Tsukuba Workshop
11
BEinGRID Vision
• Typical Technology Transfer project:
– 2 waves of 18+7 Business Experiments involving:
– SMEs in various industry-sections
– Technical and Business experts
– Set up a repository of Grid solutions, available free/at cost to
the respective sectors
– Prove that businesses will benefit from the adoption of Grid
technologies
22/02/2009
Edinburgh - Tsukuba Workshop
12
BE02 – FilmGrid
• “Movie post-production workflow”
• Reviewing data flow in the industry
– Current data movement tied into celluloid shooting
– What is the effect of digital capture?
– How useful is Sohonet other than for email?
• The FilmGrid prototype proves:
– Grid technology is highly appropriate for movie post-production
– Potentially large gains in:
– Efficiency
– Reliability
– Accountability
– Accessibility
• http://tinyurl.com/filmgrid
22/02/2009
Edinburgh - Tsukuba Workshop
13
Asset Manager
Local Files
22/02/2009
Transfer Status
Edinburgh - Tsukuba Workshop
Global Assets
14
Database Triggers
• Procedure to be executed when a modification is
made to a table
– INSERT, UPDATE or DELETE
• Various use cases
– Log changes
– Execute business rules (e.g. email a
manager when online orders push stock
levels below a specified threshold )
– Enforce business rules (e.g. all invoices
must be associated with a valid customer)
• How to set-up a trigger is dependent on
DB implementation
22/02/2009
Edinburgh - Tsukuba Workshop
15
OGSA-DAI Trigger
• Uses database triggers to call an OGSA-DAI
workflow upon modification to a database
• Extends single-database trigger functionality to:
– Span several, heterogeneous databases
– Execute powerful OGSA-DAI workflows
• Many possible use cases
– Synchronising databases
– Logging to an external database
– Ensuring or executing business logic across partners
http://tinyurl.com/ogsadaitrigger
22/02/2009
Edinburgh - Tsukuba Workshop
16
BE24 – GRID2(B2B)
• “Grid technologies for affordable data synchronization and SME
integration within B2B networks”
• Empowering existing B2B networks by electronically connecting
suppliers at an affordable price
– Webservices-based add-on to allow data exchange at database
level
– Uses OGSA-DAI Trigger to automate synchronization
• The GRID2(B2B) prototype demonstrates:
– Easy integration with multiple B2B platforms
– User in total control of what data is sent
– Automated synchronization:
• Fast and frequent data transfer
• Remove the need to enter data twice
• http://tinyurl.com/grid2b2b
22/02/2009
Edinburgh - Tsukuba Workshop
17
How does it work?
Data Service
communicates the new
information to the Data
Ducati - Starter
Federation Agent
New orders
generated
by Ducati
software
MaNeM – B2B Platform
Data Service and Data Federation
Agent are configured using the
GRID2(B2B) Configurator
Bentivogli - Partner
GRID2(B2B)
Data Federation
Agent
DBMS
Orders
written to
an internal
database
OGSA-DAI
Trigger
used to
monitor for
new data
DBMS
DBMS
GRID2(B2B)
GRID2(B2B)
Data Service
Data Service
Data Federation Agent inserts
information into B2B
database.
22/02/2009
Data Federation Agent also monitors for new data
in the B2B platform and propagates it on to the
correct member of the network
Edinburgh - Tsukuba Workshop
18
International Lattice Data Grid
• Sharing Lattice QCD data
• ILDG has no formal role
– groups collaborate informally
– working groups for metadata and middleware
• Individual groups were already starting to build data grid
infrastructures
–
–
–
–
–
UKQCD – QCDgrid, later DiGS
German groups combined into LATFOR, grid arm is LDG
US groups formed USQCD
Japanese – JLDG
Australia – Web portal
• Middleware often dictated by national considerations
– ILDG is an aggregation of existing grids
– Interoperable
22/02/2009
Edinburgh - Tsukuba Workshop
19
ILDG WG
• Edinburgh and Tsukuba personnel
• Metadata Working Group
– Tomoteru Yoshie Previous Convener
– Chris Maynard
Current Convener
• Middleware Working Group
– George Beckett, Daragh Byrne, Eilidh Grant, Radek Ostrowski, and
James Perry
– Mitsuhisa Sato, Toshiyuki Amagassa, Osamu Tatebe
• Example of Tsukuba and Edinburgh active collaboration
22/02/2009
Edinburgh - Tsukuba Workshop
20
Three requisite conditions
• Trust
– already established in the community
– known community
• Altruism
– political will to make data available
– effort to build infrastructure
– effort actually making data available
• Reward
– how to credit those making data available
– data users should cite a designated paper
22/02/2009
Edinburgh - Tsukuba Workshop
21
Three ideas to make this work
• Standard data format
– Doesn’t really matter what, as long as one can read and write
– configurations: SciDAC LIME record is 3x3 NERSC data layout
• Standard metadata
– Semantic description of the data
– Can be processed by an application
• Standard interfaces to services
– Queries to metadata catalogues (MDC)
– Queries to File Catalogue Web services (FC)
– Authentication and authorisation
22/02/2009
Edinburgh - Tsukuba Workshop
22
Architecture
22/02/2009
Edinburgh - Tsukuba Workshop
23
Summary
• Rise in data complexity
– doing things by hand is no longer scalable
– we need tools to automate logistics and glue systems and data
together
• Grid architecture sits on top of existing systems
– can access remote data with local tools
– Many different middleware stacks
– Effort required to ensure interoperability
• Tsukuba and Edinburgh already collaborated successfully on
ILDG
22/02/2009
Edinburgh - Tsukuba Workshop
24
Lunch
22/02/2009
Edinburgh - Tsukuba Workshop
25
Download