Data Dominated e-Science Visit of Clinton Foster 20 August 2003

advertisement
Data Dominated e-Science
Dr. Dave Berry
Research Manager
www.nesc.ac.uk
Visit of Clinton Foster
20th August 2003
1
Outline
What is e-Science?
3 models & 2 examples
Delivering e-Science
Open Grid Services Architecture
UK e-Science
UK e-Science: Roles and Resources
Grid Infrastructure
2
3
Foundation for e-Science
e-Science methodologies will rapidly transform
science, engineering, medicine and business
driven by exponential growth (×1000/decade
computers
software
Grid
sensor nets
instruments
colleagues
Shared data
archives
4
Focus for Three Modes of Thought
Computing Science:
Systems, Notations &
Formal Foundation
→ Process & Trust
Experiment &
Advanced Data
Collection
→
Shared Data
Models &
Simulations
→
Shared Data
Results
5
Virtual Organisations
Multi-national, Multi-discipline, Computer-enabled
Consortia, Cultures & Societies
Requires Much
Engineering,
Much Innovation
Changes Culture,
New Mores,
New Behaviours
New Opportunities, New Results, New Rewards
6
Global in-flight engine diagnostics
in-flight data
airline
global network
eg SITA
ground
station
DS&S Engine Health Center
internet, e-mail, pager
maintenance centre
data centre
Distributed Aircraft Maintenance Environment: Universities of Leeds, Oxford, Sheffield &York
7
Biology & Medicine
Extensive Research Community
>1000 per research university
Extensive Applications
Health, Food, Environment
Interacts with virtually every discipline
Physics, Chemistry, Nanoengineering, …
450 Databases relevant to bioinformatics
Heterogeneity, Interdependence, Complexity, Change, …
Wonderful Scientific Questions
How does a cell work?
How does a brain work?
How does an organism develop?
Why is the biosphere so stable?
What happens to the biosphere when the earth warms up?
…
8
Database Growth
39,856,567,747
PDB Content Growth
9
Database-mediated Communication
Experimentation
Community
Curated
Shared
Database
Data
Carries knowledge
Simulation
Community
Analysis & Theory
Community
Carries knowledge
Data
knowledge
Results
10
Organisational & Cultural Changes
Access to Computation & Data must be simple
All use a computational, semantic, data-rich web
i.e. its invisible – the portal / browser lets you do more
Responsibility of data publishers
Cost, dependability, trustworthy, capable, flexibility, …
Shared contributions compose indefinitely
Knowledge accumulation and interdependence
Contributor recognition and IPR
Complexity and management of infrastructure
Always on
Must be sustained
Paid for
Hidden
11
TeraBytes → PetaBytes
RAM time to move
15 minutes -> 2 months
1Gb/s WAN move time
10 hours ($1000) -> 14 months ($1 million)
Disk Cost
7 disks -> 6800 Disks + 490 units + 32 racks
$5000 (SCSI) -> $7 million
Disk Power
100 Watts -> 100 Kilowatts
Disk Weight
5.6 Kg -> 33 Tonnes
Disk Footprint
Inside machine -> 60 m2
May 2003 Approximately Correct
See also Distributed Computing Economics Jim Gray, Microsoft Research, MSR-TR-2003-24
12
13
Open Grid Services
Architecture
Web Services
Grid Technology
Grid Services
14
Web Services
Independence
Client from Service
Service from Client
Description
Web Services DL
…
Separation
www.w3.org/TR/SOAP
Function from Delivery
Tools & Platforms
Java ONE
Visual .NET
WebSphere
Oracle
Commercial Buy in
www. w3c. org / TR / SOAP or TR/wsdl
15
Grid Technology
Distribution
Various Protocols
FTP
Security
Single Sign in
Resource Sharing
Discovery
Process Creation
Scheduling
Portability
APIs
Gov’nm’t Agency Buy in
Foster, I., Kesselman, C. and Tuecke, S., The Anatomy of the Grid: Enabling Virtual16
Organisations, Intl. J. Supercomputer Applications, 15(3), 2001
OGSA Features
WSDL + WSIL
Life Time Management
Description
Discovery
Factories
Transient & Persistent GS
GS Handles
GS Records
Soft State
Notification
Tools & Platforms
Apache axis
…
Invocation
SOAP
RPC
…
Representations
XML + Schema
Authentication
Certificates +
Delegation
Change Management
Platform
Foster, I., Kesselman, C., Nick, J. and Tuecke, S., The Physiology of the Grid:
An Open Grid Services Architecture for Distributed Systems Integration
17
18
UK e-Science Funding
First Phase: 2001 –2004
Application Projects
£74M
>60 Projects
340 at first All Hands Meet
Core Programme
£35M
Collaborative industrial
projects
~80 Companies
Second Phase: 2003 –2006
Application Projects
£96M
Core Programme
£16M + £25M (?)
Core Grid Middleware
19
All areas of science
and engineering
2001-4
Medical
£8M
Biological
£8M
Environmental £7M
Eng & Phys
£17M
HPC
£9M
Core Prog.
£15M + £20M
Particle Phys & Astro £26M
Economic & Social
£3M
Central Labs
£5M
Research Council
2004-6
£13.1M
£10M
£8M
£18M
£2.5M
£16.2M + £DTI
£31.6M
£10.6M
£5M
20
NeSC in the UK
You are
here
HPC(x)
Glasgow
Edinburgh
Directors’ Forum
Newcastle
Architecture Task Force
Belfast
UK Adoption of OGSA
OGSA Grid Market
Manchester
Daresbury Lab
Workflow Management
Database Task Force
OGSA-DAI
Cambridge
Oxford
GGF DAIS-WG
Hinxton
RAL
Engineering Task Force
Cardiff
e-Science Institute
London
training, coordination,
Southampton
community building,
workshops, pioneering
Grid Support Centre
GridNet
e-Storm
21
UK Grid: Operational
Currently based on Globus Toolkit 2
Transition to OGSI/OGSA over the next year
Heterogenous
Many architectures and operating systems
Many organisations
Many issues still to be resolved, e.g.
OGSA definition / delivery
Portals
Combinations of Services supported
Account management and accounting
22
ODD-Genes
database
engine 1
registry
database
engine 2
PSE
23
www.nesc.ac.uk
24
Download