Applications

advertisement
The Grid
a brief briefing
Carole Goble
Information Management
Group
Roadmap



What is the Grid?
Example projects
Relationship to the Semantic Web


Example architectures
The international programme
Take Home




The Grid is an international activity
The Grid has attracted high profile
industrial and government support and
funding
The Information/Knowledge Grid is in
many ways indistinguishable from the
Semantic Web
The Grid Community’s understanding of
generic and theoretical issues for the IK
Grid is immature and hackery.
So what’s the Grid?
Isn’t it just High Performance
Computing for High Energy
Physicists?
Why Grids?


Large-scale science and engineering are done
through the interaction of people,
heterogeneous computing resources,
information systems, and instruments, all of
which are geographically and organizationally
dispersed.
The overall motivation for “Grids” is to
facilitate the routine interactions of these
resources in order to support large-scale
science and engineering.
From Bill Johnston 27 July 01
CERN: Large Hadron Collider (LHC)
Raw Data: 1 Petabyte / sec
Filtered 100Mbyte / sec = 1 Petabyte / year = 1 Million CD ROMs
CMS Detector
Why Grids?




A biochemist exploits 10,000 computers to
screen 100,000 compounds in an hour;
A biologist combines a range of diverse and
distributed resources (databases, tools,
instruments) to answer complex questions;
1,000 physicists worldwide pool resources for
petaop analyses of petabytes of data
Civil engineers collaborate to design, execute,
& analyze shake table experiments
From Steve Tuecke 12 Oct. 01
Why Grids? (contd.)




Climate scientists visualize, annotate, &
analyze terabyte simulation datasets
An emergency response team couples real
time data, weather model, population data
A multidisciplinary analysis in aerospace
couples code and data in four companies
A home user invokes architectural design
functions at an application service provider
From Steve Tuecke 12 Oct. 01
Why Grids? (contd.)



An application service provider
purchases cycles from compute cycle
providers
Scientists working for a multinational
soap company design a new product
A community group pools members’ PCs
to analyze alternative designs for a local
road
From Steve Tuecke 12 Oct. 01
The Grid Vision

“…flexible, secure, coordinated
resource-sharing among
dynamic collections of
individuals, institutions, and
resources–what we refer to as
virtual organisations”

“The Anatomy of the Grid: Enabling
Scalable Virtual Organizations” Foster,
Kesselman and Tuecke, 2001
The Grid Problem

Enable communities (“virtual
organizations”) to share geographically
distributed resources as they pursue
common goals -- assuming the absence
of…




central location,
central control,
omniscience,
existing trust relationships.
From Steve Tuecke 12 Oct. 01
Visualisation
stretch
Large scale




Multi-disciplinary
simulation
Decision support and
optimization
Virtual prototyping
Collaborative
analysis and
visualization
Data




Computation
Large scale
distributed data
management
Large scale
distributed
computation
High speed
communications
Dynamic
collaborative virtual
organisations
What is it?
Where is it?
How to get it?
When did it? happen?
Who knows it?
Why does it?
What are you doing?
interrogation
results
workflows
Technology Grid
Governance
& Control
Collaboration Grid
Online Access to
Scientific Instruments
Advanced Photon Source
wide-area
dissemination
real-time
collection
archival
storage
desktop & VR
clients with
shared controls
tomographic
reconstruction
DOE X-ray grand challenge: ANL, USC/ISI, NIST,
U.Chicago
From Steve Tuecke 12 Oct. 01
Supernova Cosmology
Network for Earthquake
Engineering Simulation


NEESgrid: national infrastructure to
couple earthquake engineers with
experimental facilities, databases,
computers, & each other
On-demand access to experiments, data
streams, computing, archives,
collaboration
NEESgrid: Argonne, Michigan, NCSA, UIUC, USC
From Steve Tuecke 12 Oct. 01
Home Computers
Evaluate AIDS Drugs

Community =




1000s of home
computer users
Philanthropic
computing vendor
(Entropia)
Research group
(Scripps)
Common goal=
advance AIDS
research
From Steve Tuecke 12 Oct. 01
myGrid



Personalised extensible
environments for data-intensive
in silico experiments in biology
Straightforward discovery,
interoperation, sharing
Workflow oriented



provenance
propagating change
Individual creativity &
collaborative working

personalisation
myGrid
resources
Question:
Nucleotide binding protein in mouse
Answer:
P12345 in Swiss-Prot is an ATPase
Terri Attwood is an expert on this
Jackson Labs have a database but you need to
register
A paper has just been published in Proteins by
the Stanford lab on this.
GeoDISE – engineering design
optimisation






Access to knowledge repository
Access to optimisation and search tools
Industrial analysis codes
Distributed computing and data resources in
design optimisation
Applied to industrial problems - large scale
CFD codes
Demonstrate scalability across distributed
computational and data resources and teams
of designers
GeoDISE
Modern engineering firms are global and distributed
How to … ?
… improve design environments
… cope with legacy code / systems
… produce optimized designs
CAD and analysis tools, user
interfaces, PSEs, and Visualization
Optimisation methods
… integrate large-scale systems in a
flexible way
Management of distributed
compute and data resources
… archive and re-use design history
Data archives (e.g. design/ system
usage)
… capture and re-use knowledge

Knowledge repositories &
knowledge capture and reuse
tools.
“Not just a problem of using HPC”
Virtual Sky http://virtualsky.org/
Broader Context

“Grid Computing” has much in common with
major industrial thrusts


Business-to-business, Peer-to-peer, Application
Service Providers, Storage Service Providers,
Distributed Computing, Internet Computing…
Sharing issues not adequately addressed by
existing technologies


Complicated requirements: “run program X at site
Y subject to community policy P, providing
access to data at Z according to policy Q”
High performance: unique demands of advanced
& high-performance systems
From Steve Tuecke 12 Oct. 01
From Steve Tuecke 12 Oct. 01
Elements of the Problem

Resource sharing



Coordinated problem solving


Beyond client-server: distributed data analysis,
computation, collaboration, …
Dynamic, multi-institutional virtual
organisations



Computers, storage, sensors, networks, …
Sharing always conditional: issues of trust, policy,
negotiation, payment, …
Community overlays on classic org structures
Large or small, static or dynamic
Problem Solving Environments
Broader Context

“Grid Computing” has much in common with
major industrial thrusts


Business-to-business, Peer-to-peer, Application
Service Providers, Storage Service Providers,
Distributed Computing, Internet Computing…
Sharing issues not adequately addressed by
existing technologies


Complicated requirements: “run program X at site Y
subject to community policy P, providing access to
data at Z according to policy Q”
High performance: unique demands of advanced &
high-performance systems
From Steve Tuecke 12 Oct. 01
The Globus Project™





Close collaboration with real Grid projects in science
and industry
Development and promotion of standard Grid
protocols to enable interoperability and shared
infrastructure
Development and promotion of standard Grid software
APIs and SDKs to enable portability and code sharing
The Globus Toolkit™: Open source, reference
software base for building grid infrastructure and
applications
Global Grid Forum: Development of standard
protocols and APIs for Grid computing
From Steve Tuecke 12 Oct. 01
Doesn’t Globus solve it all?

Globus ToolKit is focused on the
Data/Computational layer






No database connectivity
Little brokering, and static not dynamic
Weak metadata management, workflow
Trashes firewalls
No, not everything is JCL, FTP and LDAP
Distributed computation dominates
etc…etc…
Is it done?

NASA Power Grid is the only one really
working




http://www.ipg.nasa.gov
Linking similar supercomputers owned by
the same organisation
Computation-focused
High Energy Physics is atypical
Example Application Projects









AstroGrid: astronomy, etc.
(UK)
Earth Systems Grid:
environment (US DOE)
EU DataGrid: physics,
environment, etc. (EU)
EuroGrid: various (EU)
Fusion Collaboratory (US
DOE)
GridLab: astrophysics, etc.
(EU)
Grid Physics Network (US
NSF)
MetaNEOS: numerical
optimization (US NSF)
NEESgrid: civil engineering
(US NSF)









RealityGrid (UK)
DAME (UK)
Comb-e-Chem (UK)
GeoDISE (UK)
iVDGL, StarLight (US/EU)
DiscoveryNet (UK)
myGrid (UK)
GridPP (UK)
Particle Physics Data Grid
(US DOE)

etc…
“ … Since the early days of mankind the
primary motivation for the establishment of
communities has been the idea that by being
part of an organized group the capabilities of
an individual are improved. The great
progress in the area of inter-computer
communication led to the development of
means by which stand-alone processing subsystems can be integrated into multicomputer ‘communities’. … “
Miron Livny, “ Study of Load Balancing Algorithms for
Decentralized Distributed Processing Systems.”,
Ph.D thesis, July 1983.
Every Community needs a
Matchmaker!


Condor uses Matchmakers to build
Computing Communities out of
Commodity Components
.. someone has to bring together
community members who have requests
for goods and services with members
who offer them.



Both sides are looking for each other
Both sides have constraints
Both sides have preferences
Lets look at some
Architectures
A Desiderata







(adapted from Globus)
Software development toolkits
e.g. Globus toolkit
Standard protocols, services &
APIs
A modular “bag of technologies”
Enable incremental development
of grid-enabled tools and
applications
Reference implementations
Learn through deployment and
applications
Open source
Applications
Diverse global services
Core
services
Local OS
Globus Layered Grid Architecture
CERN - High Energy Physics
“Coordinating multiple resources”:
ubiquitous infrastructure services,
app-specific distributed services
Collective
Application
“Sharing single resources”:
negotiating access, controlling use
Resource
“Talking to things”: communication
(Internet protocols) & security
Connectivity
Transport
Internet
“Controlling things locally”: Access
to, & control of, resources
Fabric
Link
Internet Protocol Architecture
Application
From Steve Tuecke 12 Oct. 01
Keith Jeffery
"Reproduced by permission of the IT Innovation Centre,
University of Southampton."
http://www.it-innovation.soton.ac.uk
Three Layer Grid Abstraction
Grid
Scientific
Problems
Knowledge
Knowledge /
capability
Value chain
Processes
Information
Semantics /
process
Jobs and
Data
Data
Data /
applications
Raw
Resources
Interoperability, higher level ontologies, reasoning,
discovery, Reasoning services, Discovery services
Fulfillment
Architecture of a Grid
Discipline Specific Portals and
Scientific Workflow Management Systems
clusters
Distributed
national supercomputer
facilities
Condor pools
Fault
Management
Monitoring
= Globus services
Resources
tertiary storage
Auditing
Security
Services
Authentication
Authorization
Communication
Services
Network
Cache
Collaboration
and Remote
Instrument
Services
Uniform Data
Access
Data
Cataloguing
Global Event
Services
CoScheduling
Global
Queuing
Brokering
Uniform
Resource
Access
Grid
Information
Service
Applications: Simulations, Data Analysis, etc.
Toolkits: Visualization, Data Publication/Subscription, etc.
Grid Common Services: Standardized Services and Resources Interfaces
national user facilities
network
caches
High-speed Networks and Communications Services
Architecture of a Grid – upper layers
• Knowledge based query
Problem
Solving
Environments
• Tools to implement the human interfaces, e.g. SciRun, ECCE,
WebFlow, .....
• Mechanisms to express, organize, and manage the workflow of
problem
solutions (“frameworks”)
data publish
and
subscribe
toolkits
instrument
managemen
t toolkits
collaboratio
n toolkits
visualization
toolkits
Applications
and
Supporting
Tools
applicatio
n codes
• Access control
Grid Common Services
Distributed Resources
DCOM
Java/
Jini
Condor
-G
CORBA
Application
Development and
Execution Support
Globus
MPI
Grid enabled libraries (security, communication services, data
access, global event management, etc.)
“Knowledge Based” Data Grids
Relationships
Between
Concepts
Knowledge
Repository for
Rules
Access
Services
Rules - KQL
Knowledge
Management
XTM DTD
Ingest
Services
Knowledge or
Topic-Based
Query / Browse
Attributes
Semantics
Information
Repository
SDLIP
Information
XML DTD
(Model-based Access)
Attribute- based
Query
Fields
Containers
Folders
Storage
(Replicas,
Persistent IDs)
National Partnership for Advanced Computational Infrastructure
Grids
Data
MCAT/HDF
(Data Handling System - SRB)
Feature-based
Query
Astronomy Sky Survey
Data Grid
1. Portals and Workbenches
2.Knowledge
& Resource
Management
Concept space
4.Grid
Security
Caching
Replication
Backup
Scheduling
3. Metadata
View
Bulk Data
Catalog
Analysis Analysis
Standard APIs and Protocols
Data
View
Information Metadata Data
Data
5.
Discovery delivery Discovery Delivery
Standard Metadata format, Data model, Wire format
6.
Catalog Mediator
Data mediator
Catalog/Image Specific Access
7. Compute Resources Derived Collections Catalogs Data Archives
User Interfaces
Portals &
Portals &
Clients
Portals &
Clients
Clients
Delivery
Presentation
Aggregation - Channels
Information
about collections
NSDL
NSDL
NSDL
Collections
Collections
Collections
referenced
referenced
Referenced
items&&
items
Items
&
collections
collections
Collections
Core NSDL Bus
Meta-data delivery
Data delivery
Query
Global Ids
Security
Network
Virtual
Collections &
Mediators
Core
Core Services:
Collectionmetadata
Building
Services
Corenormalizing
Collectionmetadata
harvesting
Building
Services
persistent storage
Collection
Building
NSDL
Usage
Enhancement
NSDL
NSDL
Services
Other
NSDL
Services
Services
Metadata & data
access-based
services
Core
Services:
CI Services
annotation
CI Services
query
CI transform
Services
topic-map
registry
CI Services
personalization
CI Services
discussion
visualization...
ERA Concept model
The De Roure Triangle
Grid
Computing
eScience
Agents
eBusiness
?
Web Services
Semantic Web
Roy Williams Paul Messina
California Institute of Technology
So what is going on?
UK: http://www.escience-grid.org.uk/
International: http://www.gridforum.org/
E-Science Programme
DG Research Councils
E-Science
Steering Committee
Director’s
Awareness and Co-ordination Role
Grid TAG
Director
Director’s
Management Role
Generic Challenges
Academic Application Support
EPSRC (£15m), DTI (£15m)
Programme
Research Councils (£74m), DTI (£5m)
PPARC (£26m)
BBSRC (£8m)
MRC (£8m)
NERC (£7m)
£80m Collaborative projects
ESRC (£3m)
EPSRC (£17m)
CLRC (£5m)
Industrial Collaboration (£40m)
From Tony Hey 27 July 01
Key Elements of
UK Grid Development Plan


Development of Generic Grid Middleware
Network of Grid Core Programme e-Science
Centres







National Centre http://www.nesc.ac.uk/
Regional Centres http://www.esnw.ac.uk/
Grid IRC Grand Challenge Project
Support for e-Science Pilots
Short term funding for e-Science
demonstrators
Grid Network Team * Grid Engineering Team
Grid Support Centre * Task Forces
Adapted from Tony Hey 27 July 01
Take Home




The Grid is an international activity
The Grid has attracted high profile
industrial and government support and
funding
The Information/Knowledge Grid is in
many ways indistinguishable from the
Semantic Web
The Grid Community’s understanding of
generic and theoretical issues for the IK
Grid is immature and hackery.
Spares
Supernova Cosmology
Home Computers
Evaluate AIDS Drugs

Community =




1000s of home
computer users
Philanthropic
computing vendor
(Entropia)
Research group
(Scripps)
Common goal=
advance AIDS
research
From Steve Tuecke 12 Oct. 01
Grid viewpoints
What is it?
Where is it?
How to get it?
When did it happen?
Who knows it?
Why does it?
What are you doing?
interrogation
results
private
public
Technology Grid
Governance
& Control
Access Grid
New Biology
workflows
Particle Physics and Astronomy Research
Council (PPARC)

GridPP (http://www.gridpp.ac.uk/)


to develop the Grid technologies required to
meet the LHC computing challenge
ASTROGRID
(http://www.astrogrid.ac.uk/)

a ~£4M project aimed at building a data-grid
for UK astronomy, which will form the UK
contribution to a global Virtual Observatory
Infrastructure Deployments

Institutional Grid deployments: deploying
services and network infrastructure


International deployments: supporting
international experiments and science


DISCOM, IPG, TeraGrid, DOE Science Grid, DOD
Grid, NEESgrid, ASCI (Netherlands)
iVDGL, StarLight
Support centers


U.K. Grid Center
U.S. GRIDS Center
Download