UK e-Science & The National e-Science Centre Chinese Delegation Visit

advertisement
Chinese Delegation Visit
High Performance Computer Mission
UK e-Science
&
The National e-Science Centre
Prof. Malcolm Atkinson
Director
www.nesc.ac.uk
22nd October 2003
Outline
What is e-Science?
Our Role in Enabling e-Science
Information Grids & Data Dominates
Foundation for e-Science
e-Science methodologies will rapidly transform
science, engineering, medicine and business
driven by exponential growth (×1000/decade)

enabling a whole-system approach
computers
software
Grid
sensor nets
instruments
Diagram derived from
Ian Foster’s slide
colleagues
Shared data
archives
e-Science and SR2002
Research Council
Medical
Biological
Environmental
Eng & Phys
HPC
Core Prog.
Particle Phys & Astro
Economic & Social
Central Labs
2004-6
£13.1M
£10.0M
£8.0M
£18.0M
£2.5M
£16.2M + ?
£31.6M
£10.6M
£5.0M
2001-4
(£8M)
(£8M)
(£7M)
(£17M)
(£9M)
(£15M) + £20M
(£26M)
(£3M)
(£5M)
www.nesc.ac.uk
NeSC in the UK
Globus
Alliance
National
eScience
Centre
HPC(x)
Edinburgh
Glasgow
Newcastle
Belfast
Daresbury Lab
Directors’ Forum
Helped build a community
Engineering Task Force
Grid Support Centre
Architecture Task Force
UK Adoption of OGSA
OGSA Grid Market
Workflow Management
Database Task Force
OGSA-DAI
GGF DAIS-WG
Manchester
Cambridge
Hinxton
Oxford
Cardiff
RAL
London
Southampton
Events held in the 2nd Year
(from 1 Aug 2002 to 31 Jul 2003)
We have had 86 events: (Year 1 figures in brackets)
11 project meetings
( 4)
11 research meetings
( 7) > 3600 Participant Days
25 workshops
(17 + 1)
Establishing a training
2 “summer” schools
(0)
team
15 training sessions
(8)
Investing in community
12 outreach events
(3)
building, skill generation &
knowledge development
5 international meetings
(1)
5 e-Science management meetings (7)
(though the definitions are fuzzy!)
Suggestions always
welcome
Workshops
Space for real work
http://www.nesc.ac.uk/events/
Crossing communities
Creativity: new strategies and solutions
Written reports
Scientific Data Mining, Integration and Visualisation
Grid Information Systems
Portals and Portlets
Virtual Observatory as a Data Grid
Imaging, Medical Analysis and Grid Environments
Grid Scheduling
Provenance & Workflow
GeoSciences & Scottish Bioinformatics Forum
Duties of a NeSC Research Leader
encouraging the uptake of Grid technologies in
Astronomy and related fields
encouraging visitors, with whom you have a research
overlap, to visit Edinburgh and work with you and other
local colleagues
organising and running research workshops
assisting with the development of new core Grid and
scientific database technologies
promoting NeSC within the Universities of Edinburgh and
Glasgow through, for example, personal presentations
and more widely at conferences and workshops.
…and all that in 0.5 FTE!
Slide from Dr Bob Mann: 30 September 2003
Three-way Alliance
Multi-national, Multi-discipline, Computer-enabled
Consortia, Cultures & Societies
Theory
Models & Simulations
→
Shared Data
Requires Much
Computing Science
Engineering,
Systems, Notations &
Much Innovation Formal Foundation
Experiment &
Advanced Data
Collection
→
Shared Data
Changes Culture,
New Mores,
New Behaviours
→ Process & Trust
New Opportunities, New Results, New Rewards
Global Drivers of e-Science
Collaboration
Data Deluge
Digital Technology
Ubiquity
Cost reduction
Performance increase
Consequential Investment
UK e-Science £240 million + 80 companies
EU e-Infrastructure
USA cyberinfrastructure
…
It’s Easy to Forget
How Different 2003 is From 1993
Enormous quantities of data: Petabytes
For an increasing number of communities
gating step is not collection but analysis
Ubiquitous Internet: >100 million hosts
Collaboration & resource sharing the norm
Security and Trust are crucial issues
Ultra-high-speed networks: >10 Gb/s
Global optical networks
Bottlenecks: last kilometre & firewalls
Huge quantities of computing: >100 Top/s
Moore’s law gives us all supercomputers
Organising their effective use is the challenge
Moore’s law everywhere
Instruments, detectors, sensors, scanners, …
Organising their effective use is the challenge
Derived from Ian Foster’s slide at ssdbM July 03
Global Knowledge Communities
Often Driven by Data: E.g., Astronomy
No. & sizes of data sets as of mid-2002,
grouped by wavelength
• 12 waveband coverage of large
areas of the sky
• Total about 200 TB data
• Doubling every 12 months
• Largest catalogues near 1B objects
Data and images courtesy Alex Szalay, John Hopkins
Tera → Peta Bytes
RAM time to move
15 minutes
RAM time to move
2 months
1Gb WAN move time
10 hours ($1000)
Disk Cost
1Gb WAN move time
14 months ($1 million)
Disk Cost
7 disks = $5000 (SCSI)
Disk Power
6800 Disks + 490 units +
32 racks = $7 million
Disk Power
100 Watts
100 Kilowatts
Disk Weight
Disk Weight
5.6 Kg
33 Tonnes
Disk Footprint
Inside machine
Disk Footprint
60 m2
May 2003 Approximately Correct
See also Distributed Computing Economics Jim Gray, Microsoft Research, MSR-TR-2003-24
Three Pillars of e-Science Research
Apply known
results
Foundations
Informatics
Enable new
science
Technology
Focus for
new work
EPCC
DCS
ETF&Testbeds
Physics &
Astronomy
edikt
Repositories
Computing
industry
Applications
Steering of
development Research
Departments
Research
Institutes
Scottish
universities
Commercial
customers
Deployment
UK Test
Grid
Funding?
L2G
IBM
Contract
Local Grid
OGSA
Deployment
Testbed?
Research Applications
FireGrid
NanoCMOS
Grid
Proposals
Research
Applications
ODD-Genes?
BioInf
Physics
AstroGrid
NeuroInf
ScotGrid
GridPP
QCDGrid
e-Health
CS Research
Engage
EQUATOR
CoAKTinG
Mobile Code
Scientific
Data
Automated
Synthesis
Projects
Staff
AMUSE
DIRC
Ontologies
Papers
CS Research
Middleware, etc.
MS.NETGrid
FirstDIG
BRIDGES
ODD-Genes
GridWeaver
SunDCG
OGSA-DAI
DAIT
eldas
Grid Core
Programme
BinX
edikt
OMII
PGPGrid
Middleware,
Etc.
Data Access, Integration,
Publication, Annotation and Curation
ODD-Genes
BRIDGES
AstroGrid
ScotGrid
GridPP
OGSA-DAI
& DAIT
edikt
QCDGrid
Research
Applications
Super
COSMOS
Theme:
DAIPAC
GTI
Mobile Code
Repositories
CS Research
Mouse Atlas
Scientific
Data
DCC
Proposal
Linguistic
Corpora
ArkDB
Infrastructure Architecture
Data Intensive Users
Data Intensive Applications for Science X
Simulation, Analysis & Integration Technology for Science X
Generic Virtual Data Access and Integration Layer
Job Submission
Brokering
Registry
Banking
Data Transport
Workflow
Structured Data
Integration
Authorisation
OGSA
Resource Usage Transformation Structured Data Access
OGSI: Interface to Grid Infrastructure
Compute, Data & Storage Resources
Structured Data
Relational
Distributed
Virtual Integration Architecture
XML Semi-structured
-
Data Services
GGF Data Access and Integration Svcs (DAIS)
OGSI-compliant interfaces to access relational and
XML databases
Needs to be generalized to encompass other data
sources (see next slide…)
Generalized DAIS becomes the foundation for:
Replication: Data located in multiple locations
Federation: Composition of multiple sources
Provenance: How was data generated?
“OGSA Data Services”
(Foster, Tuecke, Unger, eds.)
Describes conceptual model for representing all
manner of data sources as Web services
Database, filesystems, devices, programs, …
Integrates WS-Agreement
Data service is an OGSI-compliant Web service
that implements one or more of base data
interfaces:
DataDescription, DataAccess, DataFactory,
DataManagement
These would be extended and combined for specific
domains (including DAIS)
OGSA-DAI Approach
Reuse existing technologies and standards
OGSA, Query languages, Java, transport
Build portTypes and services which will enable:
controlled exposure of heterogenous data resources on an OGSIcompliant grid
access to these resource via common interfaces using existing underlying
query mechanisms
(ultimately) data integration across distributed data resources
OGSA-DAI (the software) seeks to be a reference implementation of
the GGF DAIS WG standard
Can’t keep up with frequent standard changes, so software releases track
specific drafts
See http://www.ogsadai.org.uk/ for details.
Data Access & Integration Services
1a. Request to Registry
for sources of data
about “x”
SOAP/HTTP
Registry
1b. Registry
responds with
Factory handle
service creation
API interactions
2a. Request to Factory for access
to database
Factory
Client
2c. Factory returns
handle of GDS to
client
3a. Client queries GDS with
XPath, SQL, etc
3c. Results of query returned to
client as XML
2b. Factory creates
GridDataService to manage
access
Grid Data
Service
XML /
Relationa
l
database
3b. GDS interacts with database
Third Party Delivery
2
Data Set
C
L
I
E
N
T
A
P
I
R
E
Q
U
E
S
T
O
R
S
T
U
B
1
Data Set
dr
3
Data Set
Data Set
C
L
I
E
N
T
C
O
N
S
U
M
E
R
A
P
I
S
T
U
B
4
Future DAI Services
1a. Request to Registry for
sources of data about “x” &
“y”
1b. Registry
responds with
Factory handle
Data
Registry
SOAP/HTTP
service creation
API interactions
2a. Request to Factory for access and
integration from resources Sx and Sy
Data Access
& Integration
master
2c. Factory
returns handle of GDS to client
3b.
Client
Problem
tells“scientific”
Solving
analyst
Client
Application
Environment
coding
scientific
insights
Analyst
2b. Factory creates
Semantic
GridDataServices network
Meta data
3a. Client submits sequence of
scripts each has a set of queries
to GDS with XPath, SQL, etc
GDTS1
GDS
GDTS
XML
database
GDS2
Sx
3c. Sequences of result sets returned to
analyst as formatted binary described in
a standard XML notation
Application Code
GDS
GDS1
Sy
GDS3
GDS
GDTS2
GDTS
Relational
database
Take Home Message
Data is a Major Source of Challenges
AND an Enabler of

New Science, Engineering , Medicine, Planning, …
Information Grids
There are many opportunities for
International collaboration
Support for collaboration
Support for computation and data grids
Structured data fundamental
Integrated strategies & technologies needed
OGSA-DAI is here now
Join in making better DAI services & standards
Bioinformatics is a Priority Application Area
Download