What is e-Science & What is the Grid? puting Com

advertisement
Supercomputing, Visualization & eScience
Manchester Computing
What is e-Science & What is
the Grid?
W T Hewitt
Tuesday, May 31, 2016
UCISA Meeting
Edinburgh
Agenda
 What is Grid & eScience?
 The Global Programme
 The UK eScience Programme
 Impacts
2
escigriducisa/03
Supercomputing, Visualization & e-Science
Supercomputing, Visualization & eScience
Manchester Computing
What is e-Science & the Grid?
Why Grids?
 Large-scale science and engineering are done through
– the interaction of people,
– heterogeneous computing resources, information systems, and instruments,
– all of which are geographically and organizationally dispersed.
 The overall motivation for “Grids” is to facilitate the routine interactions
of these resources in order to support large-scale science and
engineering.
4
escigriducisa/03
Supercomputing,
Visualization27
& e-Science
From Bill Johnston
July 01
The Grid…






"…is the web on steroids."
"…is Napster for Scientists" [of data grids]
"…is the solution to all your problems."
"…is evil." [a system manager, of Globus]
"…is distributed computing re-badged."
"…is distributed computing across multiple administrative
domains"
– Dave Snelling, senior architect of UNICORE
5
escigriducisa/03
Supercomputing, Visualization & e-Science
 […provides] "Flexible, secure, coordinated resource
sharing among dynamic collections of individuals,
institutions, and resource"
– From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”
 "…enables communities (“virtual organizations”) to share
geographically distributed resources as they pursue
common goals -- assuming the absence of central location,
central control, omniscience, existing trust relationships."
6
escigriducisa/03
Supercomputing, Visualization & e-Science
CERN: Large Hadron Collider (LHC)
Raw Data: 1 Petabyte / sec
Filtered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMs
CMS Detector
7
escigriducisa/03
Supercomputing, Visualization & e-Science
Why Grids?
 A biochemist exploits 10,000 computers to screen 100,000 compounds
in an hour;
 A biologist combines a range of diverse and distributed resources
(databases, tools, instruments) to answer complex questions;
 1,000 physicists worldwide pool resources for petaop analyses of
petabytes of data
 Civil engineers collaborate to design, execute, & analyze shake table
experiments
From Steve Tuecke 12 Oct. 01
8
escigriducisa/03
Supercomputing, Visualization & e-Science
Why Grids? (contd.)
 Climate scientists visualize, annotate, & analyze terabyte simulation
datasets
 An emergency response team couples real time data, weather model,
population data
 A multidisciplinary analysis in aerospace couples code and data in four
companies
 A home user invokes architectural design functions at an application
service provider
From Steve Tuecke 12 Oct. 01
9
escigriducisa/03
Supercomputing, Visualization & e-Science
Broader Context
 “Grid Computing” has much in common with major
industrial thrusts
– Business-to-business, Peer-to-peer, Application Service Providers, Storage
Service Providers, Distributed Computing, Internet Computing…
 Sharing issues not adequately addressed by existing
technologies
– Complicated requirements: “run program X at site Y subject to community
policy P, providing access to data at Z according to policy Q”
– High performance: unique demands of advanced & high-performance
systems
10
escigriducisa/03
Supercomputing, Visualization & e-Science
What is the Grid?
“ Grid computing [is] distinguished from conventional
distributed computing by its focus on large-scale
resource sharing, innovative applications, and, in some
cases, high-performance orientation...we review the
"Grid problem", which we define as flexible, secure,
coordinated resource sharing among dynamic
collections of individuals, institutions, and resources what we refer to as virtual organizations."
From "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" by
Foster, Kesselman and Tuecke
11
escigriducisa/03
Supercomputing, Visualization & e-Science
New Book
12
escigriducisa/03
Supercomputing, Visualization & e-Science
What is the Grid?
 Resource sharing & coordinated problem solving in dynamic, multiinstitutional virtual organizations
 On-demand, ubiquitous access to computing, data, and all kinds of
services
 New capabilities constructed dynamically and transparently from
distributed services
 No central location, No central control, No existing trust
relationships, Little predetermination
 Uniformity
 Pooling Resources
13
escigriducisa/03
Supercomputing, Visualization & e-Science
e-Science and the Grid
‘e-Science is about global collaboration in key areas of
science, and the next generation of infrastructure that will
enable it.’
‘e-Science will change the dynamic of the way science is
undertaken.’
John Taylor,
Director General of Research Councils,
Office of Science and Technology
14
escigriducisa/03
Supercomputing, Visualization & e-Science
Why GRID?
 VERY VERY IMPORTANT
 The GRID is one way to realise the e-Science vision.
 WE ARE TRYING TO DO E-SCIENCE!
15
escigriducisa/03
Supercomputing, Visualization & e-Science
Supercomputing, Visualization & eScience
Manchester Computing
Grid Middleware
Diverse global services
Grid
services
Local OS
Common principles
 Single sign-on
– Often implying Public Key Infrastructure (PKI)




Standard protocols and services
Respect for autonomy of resource owner
Layered architectures
Higher-level infrastructures hiding heterogeneity of lower
levels
 Interoperability is paramount
17
escigriducisa/03
Supercomputing, Visualization & e-Science
Grid Middleware
Middleware
 Globus
 UNICORE
 Legion and Avaki
Data
 Storage Resource Broker (SRB)
 Replica Management
 OGSA-DAI
Scheduling
 Sun Grid Engine
 Load Sharing Facility (LSF)
Web services (WSDL, SOAP, UDDI)
 IBM Websphere
 Microsoft .NET
 Sun Open Net Environment (Sun
ONE)
–

OpenPBS and PBS(Pro)
–


from Veridian
Maui scheduler
Condor
–
18
from Platform Computing
could also go under middleware
escigriducisa/03
PC Grids
Peer-to-Peer computing
Supercomputing, Visualization & e-Science
Supercomputing, Visualization & eScience
Manchester Computing
Data-oriented Grids
Data-oriented middleware
 Wide-area distributed file systems (e.g. AFS)
 Storage Resource Broker (SRB)
–
–
–
–
–
UCSD and SDSC
Provide transparent access to data storage
Centralised architecture
Motivated by experiences of HPC users, not database users
Little enthusiasm from UK e-Science programme
 OGSA-DAI
–
–
–
–
Database Access and Integration
Strategic contribution of UK e-Science programme
Universities of Edinburgh, Manchester, Newcastle; IBM, Oracle
Alpha release January 2003
 Globus Replica Management software
– Next up!
20
escigriducisa/03
Supercomputing, Visualization & e-Science
Data Grids for
High Energy Physics
~PBytes/sec
Online System
~100 MBytes/sec
~20 TIPS
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
~622 Mbits/sec
or Air Freight (deprecated)
France Regional
Centre
SpecInt95 equivalents
Offline Processor Farm
There is a “bunch crossing” every 25 nsecs.
Tier 1
1 TIPS is approximately 25,000
Tier 0
Germany Regional
Centre
Italy Regional
Centre
~100 MBytes/sec
CERN Computer Centre
FermiLab ~4 TIPS
~622 Mbits/sec
Tier 2
~622 Mbits/sec
Institute
Institute Institute
~0.25TIPS
Physics data cache
Institute
Caltech
~1 TIPS
Tier2 Centre
Tier2 Centre
Tier2 Centre
Tier2 Centre
~1 TIPS ~1 TIPS ~1 TIPS ~1 TIPS
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more
channels; data for these channels should be cached by the
institute server
~1 MBytes/sec
Tier 4
Physicist workstations
21
escigriducisa/03
Supercomputing, Visualization & e-Science
Data Intensive Issues Include …
 Harness [potentially large numbers of] data, storage,
network resources located in distinct administrative
domains
 Respect local and global policies governing what can be
used for what
 Schedule resources efficiently, again subject to local and
global constraints
 Achieve high performance, with respect to both speed and
reliability
 Catalog software and virtual data
22
escigriducisa/03
Supercomputing, Visualization & e-Science
Desired Data Grid Functionality







23
High-speed, reliable access to remote data
Automated discovery of “best” copy of data
Manage replication to improve performance
Co-schedule compute, storage, network
“Transparency” wrt delivered performance
Enforce access control on data
Allow representation of “global” resource allocation
policies
escigriducisa/03
Supercomputing, Visualization & e-Science
Grid Standards
 Grid Standards Bodies:
– IETF: Home of the Network Infrastructure Standards
– W3C: Home of the Internet
– GGF: Home of the Grid
 GGF Defines the Open Grid Services Architecture
– OGSI is the Infrastructure part of OGSA
– OGSI Public comment draft submitted 14 February 2003
 Key OGSA Areas of Standards Development
–
–
–
–
24
Job management interfaces
Resources & Discovery
Security
Grid Economy and Brokering
escigriducisa/03
Supercomputing, Visualization & e-Science
What is OGSA?
“Web Services
with Attitude!”
Also known as
"Open Grid Services Architecture"
25
escigriducisa/03
Supercomputing, Visualization & e-Science
Aside: What are Web Services?
 Loosely Coupled Distributed Computing
– Think Java RMI or C remote procedure call
 Text Based Serialization
– XML: “Human Readable” serialization of objects
 IBM and Microsoft lead
– Web Services Description Language (WSDL)
– W3C Standardization
 Three Parts
– Messages (SOAP)
– Definition (WSDL)
– Discovery (UDDI)
26
escigriducisa/03
Supercomputing, Visualization & e-Science
Web Services in Action
UDDI
Publish/WSDL
Search
Client
https/SOAP
Java/C/Browser
WS
Platform
Any protocol
Legacy
Enterprise
Application
27
escigriducisa/03
InterStage, WebSphere,
J2EE, GLUE, SunOne,
.NET
Database ...
Supercomputing, Visualization & e-Science
Enter Grid Services
Experiences of Grid computing (and business process integration)
suggest similar extensions to Web Services
 State
– Service Data Model
 Persistence and Naming
– Two Level Naming (GSH, GSR)
– Allows dynamic migration and QoS adaptation
 Lifetime Management
– Self healing and ‘soft’ garbage collection.
 Standard PortTypes
– Guarantee of minimal level of service
– Beyond P2P is Federation through Mediation
 Explicit Semantics
– Grid Services specify semantics on top of Web Service syntax.
– PortType Inheritance
28
escigriducisa/03
Supercomputing, Visualization & e-Science
Supercomputing, Visualization & eScience
Manchester Computing
If one GRID is good then Many GRIDS must be better
US Grid Projects
 NASA Information Power Grid
 DOE Science Grid
 NSF National Virtual
Observatory
 NSF GriPhyN
 DOE Particle Physics Data Grid
 NSF DTF TeraGrid
 DOE ASCI DISCOM Grid
30
escigriducisa/03





DOE Earth Systems Grid
DOE FusionGrid
NEESGrid
NIH BIRN
NSF iVDGL
Supercomputing, Visualization & e-Science
National Grid Projects










31
Japan – Grid Data Farm, ITBL
Netherlands – VLAM, DutchGrid
Germany – UNICORE, Grid proposal
France – Grid funding approved
Italy – INFN Grid
Eire – Grid-Ireland
Poland – PIONIER Grid
Switzerland - Grid proposal
Hungary – DemoGrid, Grid proposal
ApGrid – AsiaPacific Grid proposal
escigriducisa/03
Supercomputing, Visualization & e-Science
EU GridProjects










32
DataGrid (CERN, ..)
EuroGrid (Unicore)
DataTag (TTT…)
Astrophysical Virtual
Observatory
GRIP (Globus/Unicore)
GRIA (Industrial applications)
GridLab (Cactus Toolkit)
CrossGrid (Infrastructure
Components)
EGSO (Solar Physics)
COG (Semantic Grid)
escigriducisa/03
Supercomputing, Visualization & e-Science
Supercomputing, Visualization & eScience
Manchester Computing
UK e-Science Programme
UK e-Science Programme
DG Research Councils
E-Science
Steering Committee
Director’s
Awareness and Co-ordination Role
Grid TAG
Director
Director’s
Management Role
Generic Challenges
Academic Application Support
EPSRC (£15m), DTI (£15m)
Programme
Research Councils (£74m), DTI (£5m)
PPARC (£26m)
BBSRC (£8m)
MRC (£8m)
NERC (£7m)
£80m Collaborative projects
ESRC (£3m)
EPSRC (£17m)
CLRC (£5m)
From Tony Hey 27 July 01
34
escigriducisa/03
Industrial Collaboration (£40m)
Supercomputing, Visualization & e-Science
Key Elements
 Development of Generic Grid Middleware
 Network of Grid Core Programme e-Science Centres
– National Centre http://www.nesc.ac.uk
– Regional Centres http://www.esnw.ac.uk/







Grid IRC Grand Challenge Project
Support for e-Science Pilots
Short term funding for e-Science demonstrators
Grid Network Team
Grid Engineering Team
Grid Support Centre
Task Forces
– Database lead by Norman Paton
– Architecture lead by Malcolm Atkinson
 International Involvement
35
escigriducisa/03
Supercomputing,
Visualization
Adapted
from Tony
Hey 27& e-Science
July 01
National & Regional Centres
 Centres donate equipment
to make a Grid
Edinburgh
Glasgow
Newcastle
Belfast
DL
Oxford
Cardiff
RAL
Manchester
Cambridge
Hinxton
London
Southampton
36
escigriducisa/03
Supercomputing, Visualization & e-Science
e-Science Demonstrators








37
Dynamic Brain Atlas
Biodiversity
Chemical Structures
Mouse Genes
Robotic Astronomy
Collaborative Visualisation
Climateprediction.com
Medical Imaging/VR
escigriducisa/03
Supercomputing, Visualization & e-Science
Grid Middleware R&D
 £16M funding available for industrial collaborative projects
 £11M allocated to Centres projects plus £5M for ‘Open
Call’ projects
 Set up Task Forces
– Database Task Force
– Architecture Task Force
– Security Task Force
38
escigriducisa/03
Supercomputing, Visualization & e-Science
Grid Network Team

Expert group to identify end-to-end network bottlenecks and other network
issues
– e.g. problems with multicast for Access Grid


Identify e-Science project requirements
Funding £0.5M traffic engineering/QoS project with PPARC, UKERNA and
CISCO
– investigating MPLS using SuperJANET network


39
Funding DataGrid extension project investigating bandwidth scheduling with
PPARC
Proposal for ‘UKLight’ lambda connection to Chicago and Amsterdam
escigriducisa/03
Supercomputing, Visualization & e-Science
UK e-Science Pilot Projects








GRIDPP (PPARC)
ASTROGRID (PPARC)
Comb-e-Chem (EPSRC)
DAME (EPSRC)
DiscoveryNet (EPSRC)
GEODISE (EPSRC)
myGrid (EPSRC)
RealityGrid (EPSRC)
RASMOL
40
escigriducisa/03
 Climateprediction.com (NERC)
 Oceanographic Grid (NERC)
 Molecular Environmental Grid
(NERC)
 NERC DataGrid (+ OST-CP)
 Biomolecular Grid (BBSRC)
 Proteome Annotation Pipeline
(BBSRC)
 High-Throughput Structural
Biology (BBSRC)
 Global Biodiversity (BBSRC)
Supercomputing, Visualization & e-Science
e-Science Centres of Excellence







41
Birmingham/Warwick – Modelling
Bristol – Media
UCL – Networking
White Rose Grid – Leeds, York, Sheffield
Lancaster – Social Science
Leicester – Astronomy
Reading - Environment
escigriducisa/03
Supercomputing, Visualization & e-Science
UK e-Science Grid
Edinburgh
Glasgow
Newcastle
Belfast
Manchester
DL
Oxford
Cardiff
Cambridge
RL
London
Hinxton
Soton
42
escigriducisa/03
Supercomputing, Visualization & e-Science
UK e-Science Funding
 First Phase: 2001 –2004
 Application Projects
– £74M
– All areas of science and engineering
 Core Programme
– £15M + £20M (DTI)
– Collaborative industrial projects
Second Phase: 2003 –
2006
 Application Projects
– £96M
– All areas of science and
engineering
 Core Programme
– £16M
– Core Grid Middleware
– DTI follow-on?
43
escigriducisa/03
Supercomputing, Visualization & e-Science
 EPSRC: Computer Science for e-Science
– £9M, 18 projects so far
 ESRC: National e-Social Science Centre + 3 hubs
– ~£6M
 PPARC
 MRC
 BBSRC
44
escigriducisa/03
Supercomputing, Visualization & e-Science
Core Programme: Phase 2






45
UK e-Science Grid/Centres and e-Science Institute
Grid Operation Centre and Network Monitoring
Core Middleware engineering
National Data Curation Centre
e-Science Exemplars/New Opportunities
Outreach and International involvement
escigriducisa/03
Supercomputing, Visualization & e-Science
Other Activities
 Security Task Force
– Joint fund key security projects with EPSRC & JCSR and coordinated effort
with NSF NMI Internet2 projects
– JCSR £2M call in preparation
 UK Digital Curation Centre
– £3M, Core e-Science + JCSR
 JCSR
– £3M per annum
46
escigriducisa/03
Supercomputing, Visualization & e-Science
SR2004 – e-Science Infrastructure








47
Persistent UK e-Science Research Grid
Grid Operations Centre
UK Open Middleware Infrastructure Institute
National e-Science Institute
UK Digital Curation Centre
AccessGrid Support Service
e-Science/Grid collaboratories Legal Service
International Standards Activity
escigriducisa/03
Supercomputing, Visualization & e-Science
Supercomputing, Visualization & eScience
Manchester Computing
Conclusions
Today’s Grid




49
A Single System Image
Transparent wide-area access to
large data banks
Transparent wide-area access to
applications on heterogeneous
platforms
Transparent wide-area access to
processing resources
escigriducisa/03

Security, certification, single sign-on
authentication, AAA
–

Data access,Transfer & Replication
–

Grid Security Infrastructure,
GridFTP, Giggle
Computational resource discovery,
allocation and process creation
–
GRAAM, Unicore, Condor-G
Supercomputing, Visualization & e-Science
Reality Checks!!
 The Technology is Ready
– Not true — its emerging
•
•
•
•
•
Building middleware, Advancing Standards, Developing, Dependability
Building demonstrators.
The computational grid is in advance of the data intensive middleware
Integration and curation are probably the obstacles
But!! It doesn’t have to be all there to be useful.
 We know how we will use grid services
– No — Disruptive technology
• Lower the barriers of entry.
50
escigriducisa/03
Supercomputing, Visualization & e-Science
Grid Evolution

1st Generation Grid
–
–
–
–

2nd Generation Grid
–
–
–
–
–
51
Computationally intensive, file access/transfer
Bag of various heterogeneous protocols & toolkits
Recognises internet, Ignores Web
Academic teams
Data intensive -> knowledge intensive
Services-based architecture
Recognises Web and Web services
Global Grid Forum
Industry participation
escigriducisa/03
We are here!
Supercomputing, Visualization & e-Science
Impacts
 It's all about interoperability, really.
 Web & Grid Services are creating a new marketplace for
components
 If you're concerned with systems integration or internet
delivery of services, embrace Web Services technologies
now. You'll be ready for Grid Services when they're ready
for you.
– If you're a developer, get Web Services on your CV
– If you're an IT manager, collect Web Service expertise through hiring or
training
 Software license models must adapt
52
escigriducisa/03
Supercomputing, Visualization & e-Science
I don't want to share!
Do I need a grid?
53
escigriducisa/03
Supercomputing, Visualization & e-Science
In conclusion
 The GRID is not, and will not, be free
– must pay for resources
 What have we to show for £250M?
54
escigriducisa/03
Supercomputing, Visualization & e-Science
Acknowledgements
 Carole Goble
 Stephen Pickles
 Paul Jeffreys
 University of Manchester
 Academic collaborators
 Industrial collaborators
 Funding Agencies: DTI, EPSRC, NERC, ESRC, PPARC
55
escigriducisa/03
Supercomputing, Visualization & e-Science
Supercomputing, Visualization & eScience
Manchester Computing
SVE @ Manchester Computing
World Leading Supercomputing
Service, Support and Research
Bringing Science and
Supercomputers Together
www.man.ac.uk/sve
sve@man.ac.uk
Download