Learning from e-Science: Grids and e-Infrastructure at

advertisement
Learning from
e-Science: Grids
and e-Infrastructure
at
GridKa School
Karlsruhe
Malcolm Atkinson
Director
www.nesc.ac.uk
26th September 2005
Contents
History & Motivation for e-Science & Grids
Linklatter to Foster & Kesselman to …
What are Grids & Web Services
Disruptive technology advances
Rallying cries
Example Grids & Projects
National Grids, Open Science Grid, BIRN, GEON, …
Data and Information – Biomedicine, Engineering &
Finance
Example Middleware Project
OGSA-DAI – international use and collaboration
Reaping the Benefit of Grids
What is the impact on your research
Future e-Infrastructure
Ambient Knowledge-based Computational
Infrastructure
1
What is e-Science?
Goal: to enable better research in all
disciplines
What is e-Science?
Method: Invention and exploitation of
advanced computational methods
to generate, curate and analyse research
data
X
X
From experiments, observations and simulations
Quality management, preservation and reliable evidence
to develop and explore models and
simulations
X
X
Computation and data at extreme scales
Trustworthy, economic, timely and relevant results
to enable dynamic distributed virtual
organisations
X
X
Facilitating collaboration with information and resource
sharing
Security, reliability, accountability, manageability and agility
2
Distributed Systems to Grids
Bespoke pioneering distributed systems, e.g. SABRE & SAGE
Linklider proposes shared multi-site computing
ARPA net
IBM CICS
Ethernet
TCP
Dozens of academic networks
Two dominant protocols
CORBA & DCOM
IP-based Internet
Academic & Research
WWW
1960
1970
1980
1990
2000
Distributed Systems to Grids
Bespoke pioneering distributed systems
D-Grid
starts
Many
research
grids
Linklater proposes shared
multi-site
computing
Using many protocols & M/W stacks
IBM CICS
Web Services
ARPA net
Unicore networks
Dozens of academic
Two dominant protocols
Globus
CORBAStarts
& DCOM
Collaboration via shared
IP-based Internet
bio/chem/medical
I-way
Academic & Research
“DBs”
Condor
WWW
1960
1970
1980
1990
2000
3
Why use / build Grids?
Research Arguments
Enables new ways of working
New distributed & collaborative research
Unprecedented scale and resources
Economic Arguments
Reduced system management costs
Shared resources ⇒ better utilisation
Pooled resources ⇒ increased capacity
Load sharing & utility computing
Cheaper disaster recovery
Why use / build Grids?
Computer Science Arguments
New attempt at an old hard problem
Frustrating ignorance existing results
New scale, new dynamics, new scope
Engineering Arguments
Enable autonomous organisations to
X
X
X
Write complementary software components
Set up run & use complementary services
Share operational responsibility
4
Why use / build Grids?
Political & Management Arguments
Stimulate innovation
Promote intra-organisation collaboration
Promote inter-enterprise collaboration
What is e-Infrastructure – Political view
A shared resource
That enables science,
research,
engineering,
medicine, industry, …
It will improve UK /
European / …
productivity
X
X
Lisbon Accord 2000
E-Science Vision SR2000
– John Taylor
Commitment by UK
government
X
Sections 2.23-2.25
Always there
X
c.f. telephones, transport,
power
5
UK e-Science Budget
(2001-2006)
Total: £213M + £100M via JISC
EPSRC Breakdown
M RC (£21.1M )
10%
HPC (£11.5M)
BBSRC (£18M )
15%
8%
EPSRC (£77.7M )
37%
Applied (£35M)
Staff costs 45%Grid Resources
NERC (£15M )
7%
Computers & Network Core (£31.2M)
40%
(£57.6M
)
funded separately PPARC27%
CLRC (£10M )
5%
ESRC (£13.6M )
6%
+ Industrial Contributions £25M
Source: Science Budget 2003/4 – 2005/6, DTI(OST)
Slide from Steve Newhouse
Contents
History & Motivation for e-Science & Grids
Linklatter to Foster & Kesselman to …
What are Grids & Web Services
Disruptive technology advances
Rallying cries
Example Grids & Projects
National Grids, Open Science Grid, BIRN, GEON, …
Data and Information – Biomedicine, Engineering &
Finance
Example Middleware Project
OGSA-DAI – international use and collaboration
Reaping the Benefit of Grids
What is the impact on your research
Future e-Infrastructure
Ambient Knowledge-based Computational
Infrastructure
6
Grids: a foundation for e-Science
e-Science methodologies will rapidly transform
science, engineering, medicine and business
driven by exponential growth (×1000/decade)
X
enabling a whole-system approach
computers
software
Grid
sensor nets
instruments
Diagram derived from
Ian Foster’s slide
colleagues
Shared data
archives
Service-Oriented Architecture
Registry
Discovery
Registration
Invocation
Client
Service
7
Grid Construction Principles
Dynamic coupling based on SOA
Respect Autonomy – local policies
Independent construction & provision
Requires adherence to agreed protocols
Interoperability
Preferably with widely adopted standards
Algorithms & Information structures
Must scale well
Must be tolerant to partial failure
Must be tolerant to partial system change
Mechanisms to build trust are essential
Much more than just providing services
Providing mutually consistent services with compatibly policies
Why are distributed systems hard
They are necessary to our modern
way of life
telephone, airline booking, internet, web,
financial trading
X
X
Access to remotely curated data and remote / mobile
resources
Integration of company-wide IT support for R&D
Global, always on, single-purpose
systems
Hard to build, finance and manage
Often based on sharing a common
(may be replicated) database
8
Why are distributed systems hard
Global, always on, multi-purpose
systems
Never been done – can it be done?
Only by a huge international
collaborative effort
Challenges versus Goals
Challenge
Heterogeneity & Variety
Complex platform
behaviour
Partial failures
Partial failures + large
tasks
Autonomy – owner’s rights
Independent provision
Scale, costs & latency
Vulnerable to misuse
Diverse & Evolving
Valuable assets
Reputation, equipment,
teams, data,
algorithms, working
practices
Goal
Simple operational
model
Simple application
model
Simple user model
Minimal resource
wastage
Stability & uniformity
Simple resource access
Good performance
Dependable protection
Flexible & agile
IPR & assets well
protected
9
The Primary Requirement …
Building people grids
Enabling People to Work Together on Challenging Projects: Science, Engineering & Medicine
Contents
History & Motivation for e-Science & Grids
Linklatter to Foster & Kesselman to …
What are Grids & Web Services
Disruptive technology advances
Rallying cries
Example Grids & Projects
National Grids, Open Science Grid, BIRN, GEON, …
Data and Information – Biomedicine, Engineering &
Finance
Example Middleware Project
OGSA-DAI – international use and collaboration
Reaping the Benefit of Grids
What is the impact on your research
Future e-Infrastructure
Ambient Knowledge-based Computational
Infrastructure
10
The e-Science
Centres
Globus Alliance
National
Centre for
e-Social
Science
Open
Middleware
Infrastructure
Institute
e-Science
Institute
Grid
Operations
Support
Centre
Digital
Curation
Centre
CeSC (Cambridge)
EGEE
National
Institute
for
Environmental
e-Science
How Grid Software Works: NSF Network for
Earthquake Engineering Simulation (NEES)
Transform our ability to carry out research vital to
reducing vulnerability to catastrophic earthquakes
Ian Foster
11
UCSF
UIUC
From Klaus Schulten, Center for Biomollecular Modelling and Bioinformatics, Urbana-Champaign
Grid2003: An Operational Grid
24
28 sites (2100-2800 CPUs) & growing
400-1300 concurrent jobs
¾ 8 substantial applications + CS experiments
¾ Running since October 2003
¾
¾
Korea
http://www.ivdgl.org/grid2003
12
Database Growth
PDB Content Growth
BRIDGES
C F G V ir t u a l
P u b lic a lly C u r a te d D a ta
E nsem bl
O r g a n is a t io n
O M IM
G la s g o w
S W IS S -P R O T
P r iv a te
E d in b u r g h
MGI
VO Authorisation
P r iv a te
d a ta
O x fo rd
s
bla
t
P r iv a te
d a ta
H UG O
…
RGD
L e ic e s te r
DATA
HUB
OGSA-DAI
Synteny
Grid
Service
d ata
Information
Integrator
P r iv a te
d a ta
N e th e r la n d s
P r iv a te
d a ta
London
P r iv a te
d a ta
+
13
27
Mammography
A prototype of a national database of
mammographic images in support of the UK
breast screening programme
Temporal
mammography
Standard
Standard
Mammo
Mammo
Format
Format
Computer
Aided
Detection
Mammograms have different
appearances, depending on image
settings and acquisition systems
3D View
eDiaMoND: Screening for Breast
Cancer
Patients
Screening
1 Trust Æ Many Trusts
Collaborative Working
Audit capability
Epidemiology
Electronic
Patient Records
Radiology reporting
systems
Letters
Case Information
Assessment/ Symptomatic
Biopsy
2ndary Capture
Or FFD
X-Rays and
Case Information
Other Modalities
-MRI
-PET
-Ultrasound
Symptomatic/Assessment
Information
eDiaMoND
Grid
Case and
Reading Information
Better access to
Case information
And digital tools
SMF
CAD
Training
Case and
Reading Information
Digital
Reading
3D Images
Supplement Mentoring
With access to digital
Training cases and sharing
Of information across
clinics
Manage Training Cases
Perform Training
SMF
CAD
Temporal Comparison
Provided by eDiamond project: Prof. Sir Mike Brady et al.
14
Virtual Observatories
Observations made across entire electromagnetic spectrum
ROSAT ~keV DSS Optical 2MASS 2µ IRAS 25µ
IRAS 100µ
GB 6cm
NVSS 20cm WENSS 92cm
⇒e.g. different views of a local galaxy
Need all of them to understand physics fully
Databases are located throughout the world
Peter Clarke
Contents
History & Motivation for e-Science & Grids
Linklatter to Foster & Kesselman to …
What are Grids & Web Services
Disruptive technology advances
Rallying cries
Example Grids & Projects
National Grids, Open Science Grid, BIRN, GEON, …
Data and Information – Biomedicine, Engineering &
Finance
Example Middleware Project
OGSA-DAI – international use and collaboration
Reaping the Benefit of Grids
What is the impact on your research
Future e-Infrastructure
Ambient Knowledge-based Computational
Infrastructure
15
Data Services: challenges
• Scale
– Many sites, large collections, many uses
• Longevity
– Research requirements outlive technical decisions
• Diversity
– No “one size fits all” solutions will work
– Primary Data, Data Products, Meta Data, Administrative data, …
• Many Data Resources
– Independently owned & managed
–
–
–
–
No common goals
No common design
Work hard for agreements on foundation types and ontologies
Autonomous decisions change data, structure, policy, …
– Geographically distributed
• and I haven’t even mentioned security yet!
Slide from Neil Chue Hong
OGSA-DAI In One Slide
• An extensible framework for data
access and integration.
• Expose heterogeneous data
resources to a grid through web
services.
• Interact with data resources:
– Queries and updates.
– Data transformation / compression
– Data delivery.
• Customise for you project using
– Additional Activities
– Client Toolkit APIs
– Data Resource handlers
Slide from Neil Chue Hong
• A base for higher-level services
– federation, mining, visualisation,…
16
Core features of OGSA-DAI – I
• A framework for building applications
– Supports data access, insert and update
– Relational: MySQL, Oracle, DB2, SQL Server, Postgres
– XML: Xindice, eXist
– Files – CSV, BinX, EMBL, OMIM, SWISSPROT,…
– Supports data delivery
–
–
–
–
SOAP over HTTP
FTP; GridFTP
E-mail
Inter-service
– Supports data transformation
– XSLT
– ZIP; GZIP
– Supports security
– X.509 certificate based security
Slide from Neil Chue Hong
Core features of OGSA-DAI – II
• A framework for building data clients
– Client toolkit library for application developers
• A framework for developing functionality
– Extend existing activities, or implement your own
– Mix and match activities to provide functionality you need
• Highly-extensible
– Customise our out-of-the-box product
– Provide your own services, client-side support and data-related functionality
• Comprehensive documentation and tutorials
• Latest release supports GT3.2 (to be deprecated), GT4.0, and Axis 1.2 /
OMII_2 using Java 1.4
Slide from Neil Chue Hong
17
International Collaboration & Use
USA:
o Globus Alliance
o IBM Corporation
o caBIG
o BIRN
o Indiana University
o GridSphere
o GEON
o LEAD
o MCS
o NCSA
o Secure Data Grid
o UNC
Tutorials
Boston
CERN
Edinburgh
San Francisco
Seoul
Tokyo
UK:
o OMII
o OMII-UK
o NGS
o NCeSS
Europe:
o NIeeS
o CERN
o AstroGrid
o DataMiningGrid
o BioSimGrid
o GridMiner
o BRIDGES
o GridSphere
o CancerGrid
o inteligrid
o ConvertGrid
o N2Grid
o eDiaMonD
o OntoGrid
o EDINA
o Provenance
o First Group plc
o SIMDAT
o Fujitsu Labs Europe
o OMII-EU
o GEDDM
o GeneGrid
o Genomic Technology and Informatics
o GOLD
o Human Genetics Unit
Austria
o IBM UK
1%
o myGrid
France
o Oracle UK
3%
Cambridge
Chicago
London
Seattle
Singapore
ISSGC 03 to 05
China:
Japan:
o CAS
o AIST
o ChinaGrid
o BioGrid
o cnGrid
o NAREGI
o INWA
o OMII-China
South Korea:
o KISTI
Others
20%
Australia:
o Curtin Business School
o INWA
China
40%
Italy
2%
Japan
5%
United Kingdom
DIALOGUE
workshops
15%
Columbus, Edinburgh, Indiana, Vienna
1485 registered
users
United States
Chicago, Manchester,
San Diego
11%
Germany
3%
5250+ downloads
Meeting User Requirements
FirstDIG
ConvertGrid
eDiaMoND
GeneGrid
BRIDGES
LEAD
Grid Miner
OGSA WebDB
OGSA-DQP
caBIG
18
Contents
History & Motivation for e-Science & Grids
Linklatter to Foster & Kesselman to …
What are Grids & Web Services
Disruptive technology advances
Rallying cries
Example Grids & Projects
National Grids, Open Science Grid, BIRN, GEON, …
Data and Information – Biomedicine, Engineering &
Finance
Example Middleware Project
OGSA-DAI – international use and collaboration
Reaping the Benefit of Grids
What is the impact on your research
Future e-Infrastructure
Ambient Knowledge-based Computational
Infrastructure
It’s a Bonanza – can you keep your head?
Your insights in your subject
Foundation step in discovery
Directs your line of work
Selecting well is still a hard challenge
A Jungle of Opportunities
Retain judgement & sense of direction
Don’t be distracted by technology, …
Be a Smart Tool User
Understand and exploit tools
Demand good & persistent tools
Early adopters may win
or be mired in a techno-bog
19
Be a Smart Data User
Choosing data sources
How do you find them?
You’re an innovator
How do they describe and advertise them?
Is the equivalent of Google possible?
∴Your model ≠ their model
Obtaining access to that data
Overcoming administrative barriers
Overcoming technical barriers
⇒ Negotiation & patience
needed from both sides
Understanding that data
The parts you care about for your research
Extracting nuggets from multiple sources
Pieces of your jigsaw puzzle
Combing them using sophisticated models
The picture of reality in your head
Analysis on scales required by statistics
Coupling data access with computation
Repeated Processes
Examining variations, covering a set of candidates
Monitoring the emerging details
Coupling with scientific workflows
Be a Smart Collaborator
Understand & Negotiate
Position, Role & Responsibilities
Deliver on your promises
On time
Record and State Provenance of Work
Build international & national
collaboration
Investment in communication and
learning
Expect good collaboration from others
20
CS: Take Home Message
There are plenty of Research Challenges
Workflow & DB integration, co-optimised
Distributed Queries on a global scale
Heterogeneity on a global scale
Dynamic variability
X
X
Authorisation, Resources, Data & Schema
Performance
Some Massive Data
Metadata for discovery, automation, repetition, …
Catalogues
Optimising Data replication & Data movement
Provenance tracking
Grasp the theoretical & practical challenges
Working in Open & Dynamic systems
Incorporate all computation
Contents
History & Motivation for e-Science & Grids
Linklatter to Foster & Kesselman to …
What are Grids & Web Services
Disruptive technology advances
Rallying cries
Example Grids & Projects
National Grids, Open Science Grid, BIRN, GEON, …
Data and Information – Biomedicine, Engineering &
Finance
Example Middleware Project
OGSA-DAI – international use and collaboration
Reaping the Benefit of Grids
What is the impact on your research
Future e-Infrastructure
Ambient Knowledge-based Computational
Infrastructure
21
E-Infrastructure Nirvana
Users: Individual & Organisations
User Adaptation
Layer
DIK Adaptation
Layer
CSC Adaptation
Layer
Integration,
Sharing, Trust building,
Resource Managing
Distributed System
Infrastructure
Compute
Storage &
Communica
tions
Providers
Data
Information &
Knowledge
Providers
E-Infrastructure: the path to Nirvana 1
Users: Individual & Organisations
Stability
Extended with
Improved
Abstractions
User Adaptation
Layer
Integration,
Sharing, Trust building,
Resource Managing
Distributed System
Infrastructure
Data
Information &
Knowledge
Providers
New
technology
Discipline
Specific
Toolsets
Languages
& Portals
User Mobility
Compute
Storage &
Communica
tions
Providers
22
E-Infrastructure: the path to Nirvana 2
Users: Individual & Organisations
Stability +
Higher-order
Operations +
Improved (semantic)
Meta data
User Adaptation
Layer
Integration,
Sharing, Trust building,
Resource Managing
Distributed System
Infrastructure
Compute
Storage &
Communica
tions
Providers
Data
Information &
Knowledge
Providers
E-Infrastructure: the path to Nirvana 3
Users: Individual & Organisations
Increasing
Self-organisation
&
Autonomic
Behaviour
Data
Information &
Knowledge
Providers
User Adaptation
Layer
Integration,
Sharing, Trust building,
Resource Managing
Distributed System
Infrastructure
Automated
Generation
Of
Stability
Adapters
&
Translators
Compute
Storage &
Communica
tions
Providers
23
Summary: Take home message
E-Infrastructure is arriving
Built on Grids & Web Services
Data and Information grow in importance
There is a dramatic rate of change
An opportunity for everyone
24
Download