GridLab - Grid Application Toolkit and Testbed

advertisement
GridLab
Grid Application Toolkit and Testbed
Contact:
Jarek Nabrzyski, GridLab Project Coordinator
naber@man.poznan.pl
Poznań Supercomputing and Networking Center,
Noskowskiego 12/14
61-704 Poznań, POLAND
http://www.gridlab.org
1
GridLab – Grid Application
Toolkit and Testbed
• User and Grid Application Developer
oriented
• Budget: 6MEuro (5086k funded by the EC)
• Mixture of grid users, application
developers, grid developers and vendors
2
GridLab participants
•
•
•
•
•
•
•
•
•
•
•
Poznań Supercomputing and Networking Center, Poland (PSNC) –
Project Coordinator,
Albert Einstein Institute , Germany,
SZTAKI, Hungary
Masaryk University, Czech Republik,
University of Lecce, Italy
Konrad Zusse Centrum (ZIB), Germany
Vrije University, Netherlands,
Sun Microsystems Gridware GmbH, Germany
Compaq EMEA, France
University of Athens, Greece
In co-operation with
– Argonne National Laboratory (Ian Foster’s group)
– ISI (Carl Kesselman’s group)
– University of Wisconsin (Miron Livny’s group)
•
•
2 subcontracting sites provide additional testbed resources
Many other partners will work unfunded (GGF Apps WG connections)
3
GridLab user community
• Gravitational wave detection and analysis
• Numerical relativity (black hole collisions)
• Other simulation-based users
4
History, origins

EGrid Testbed





Cactus worm
Dynamic grid
computing
12 sites
Globus
Presented at
SC2000
5
Cactus Worm
Illustration of basic scenario
• Cactus simulation (could be anything) launched from portal
• Queries a Grid Information Server, finds available resources
• Migrates itself to next site, according to some criterion
• Registers new location to GIS, terminates old simulation
• User tracks/steers, using
http, streaming data, etc...…
If we can do this, much of what
we want can be done!
• Need to work closely with Grid
Infrastructure developers to do this!
6
GridLab Aims
• Get Computational Scientists using the “Grid”
and Grid services for real, everyday,
production work (AEI Relativists, EU Network,
Grav Wave Data Analysis, Cactus User
Community).
• Make it easier for applications to make
flexible, efficient, robust, use of the resources
available to their virtual organizations.
• Dream up, prototype, and test new
application scenarios which make adaptive,
dynamic, wild, and futuristic uses of
resources.
7
GridLab user requirements
• Large scale simulations too big to fit on any
current supercomputer,
• Friendly code composition tools to build the
parameter files,
• Performance prediction tools,
• Dynamic brokering services,
• Scheduling and data management,
• Dynamic grid monitoring,
• Remote access tools to visualize data, monitor
performance and simulation properties,
interactively steer the simulation
8
What GridLab Isn’t …
• Don’t want to develop low level Grid
Infrastructure (just want to nudge it)
• Don’t want to repeat work which has
already been done (want to incorporate
and assimilate it … Globus APIs, ASC
Portal (GridSphere/Orbiter), GPDK,
GridPort, DataGrid, …)
9
Solution
“Grid Application
Toolkit”
Provides a layer between
applications and emerging grid
technologies. Provides an
application developer orientated
API, allowing the flexible use of
different tools and services, as
well as providing protection from
developing software.
“GridLab Testbed/VO”
Continuous Dialogue
End Users
GAT Tool
Developers
GAT-API
Developers
Grid
Infrastructure
Developers
Diverse controllable environment
for developing and testing
applications and tools, software
maintained by people who know it.
10
GridLab scenario (1)
• Routine realtime analysis of gravitational wave
data from the Hannover detector identifies a
burst event, but this standard analysis reveals no
information about the burst location.
• To obtain the location, desperately required by
astrophysicists for turning their telescopes to
view the event before it fades, a large series of
templates must be cross-correlated against the
detector data.
• A German astrophysicist accesses the GEO600
Portal, and using the performance tool finds that
3 TFlops/s is needed to analyze the 100GB of
raw data in the required hour.
11
GridLab scenario (2)
• Local resources are insuficient, so using the brokering tool,
she locates the fastest available machines around the world.
• Broker selects five suitable machines, and with scheduling
and data management tools, data is moved, executables
created and the analysis starts.
• In an Cracow bar, twenty minutes later, an SMS message
from the portal's notification tool, informs her that one
machine is overloaded, breaking the runtime contract.
• She connects with her PDA to the portal, and instructs the
migration tool to move this part of the analysis to a different
machine.
• Within the specified hour, a second SMS message tells her
that analysis is finished, and the resulting data is now on her
local machine. Using this location data, observatories are able
to find and view an exceptionally strong gamma-ray burst,
characteristic of a collision of neutron stars.
12
Issues and key objectives
• Co-development of Infrastructure and Applications
– Application driven grid technologies,
– Easy and efficient use of Grid resources in a real user
environment,
• Dynamic Grid Computing
– Application awareness of the changing grid environment
•
•
•
•
•
Investigate various user scenarios,
Design and develop a Grid Application Toolkit (GAT),
Simultaneously enhance real applications for the Grid,
Test the Grid-enabled applications on real test beds,
Design and develop user application portals
13
Grid Application Toolkit
• Application developer should be able to build
simulations with tools that easily enable dynamic grid
capabilities
• Want to build programming API to easily allow:
– Query information server (e.g. GIIS)
• What’s available for me? What software? How many processors?
– Network Monitoring
– Decision Routines (Thorns)
• How to decide? Cost? Reliability? Size?
– Spawning Routines (Thorns)
• Now start this up over here, and that up over there
– Authentication Server
• Issues commands, moves files on your behalf Data Transfer
• Use whatever method is desired (Gsi-ssh, Gsi-ftp, Streamed HDF5,
scp…)
– Etc…
14
GridLab: New Paradigms for
Dynamic Grids
• Code should be aware of its environment
– What resources are out there NOW, and what is their current state?
– What is my allocation?
– What is the bandwidth/latency between sites?
• Code should be able to make decisions on its own
– A slow part of my simulation can run asynchronously…spawn it off!
– New, more powerful resources just became available…migrate
there!
– Machine went down…reconfigure and recover!
– Need more memory…get it by adding more machines!
• Code should be able to publish this information to central
server for tracking, monitoring, steering…
– Unexpected event…notify users!
– Collaborators from around the world all connect, examine
simulation.
15
Dynamic Grid Computing
Queue time over,
find new machine
Free CPUs!!
Site C
Site B
Archive data
Clone job with
steered parameter
Add more
resources
Site D
Calculate/Output
Invariants
Found a horizon,
try out excision
Look for
horizon
Calculate/Output
Grav. Waves
Find best
resources
Go!
Site A
16
Advanced Portal Computing
A Portal to Computational Science
Cactus Computational Toolkit
1. User has
science idea...
2. Composes/Builds
Code Components
w/Interface...
Science, Autopilot, AMR, Petsc, HDF,
MPI, GrACE, Globus, Remote Steering...
4. Steers simulation,
monitors performance...
5. Collaborators log in to monitor...
There are a lot of generic
users that need this technology
3. Selects Appropriate Resources..
Remote Viz and Steering
Must be able to watch/control any simulation
live…
HTTP
Any Viz Client:
LCA Vision, OpenDX
Remote
Viz data Changing any steerable parameter
HDF5
•
•
•
•
Parameters
Physics, algorithms
Performance
User Preferences
Remote
Viz data
Amira
18
User’s View ... simple!
19
GirdLab Architecture
20
Workpackages (1)
• WP1: Grid Application Toolkit (AEI)
– This is a key component of GridLab - link between Grid
middleware and applications, usable by any conforming
application or middleware component. Requiring input from,
and connecting to, most other workpackages and
components.
• WP2: Cactus Grid Application Toolkit (AEI)
– provides an extended GAT interface for Cactus, a very
general toolkit framework supporting different Grid
applications, from astrophysics to chemical engineering.
Cactus will be one of the primary application drivers for the
GAT, and the project generally.
• WP3: Work-flow Application Toolkit (CARDIFF)
– Will develop Grid capabilities for a widely used dataflow
programming environment, Triana, used in gravitational
21
Workpackages (2)
• WP4: Grid Portals (AEI)
– will be highly application driven, aimed at providing uniform, flexible
and intuitive user access to Grid resources from anywhere, as well as
administration tools for maintaining a Grid environment.
• WP5: Testbed management (MU)
– will administrate and maintain an active development testbed across
roughly a dozen EU sites (leveraging the work of the EGrid), deploying
technologies as they are developed by the project. This workpackage
will also coordinate with sites in the USA-based NCSA Alliance and
others to test and develop interoperability.
• WP6: Security (PSNC)
– will develop the required security mechanisms and will ensure the
integration of all the technologies developed under other WPs, taking
into account the various local security requirements and state of the art
solutions.
22
Workpackages (3)
• WP7: Adaptive Application Components (VU)
– develops a set of components and APIs to be plugged into the
toolkit, for example to take monitoring information and implement
basic techniques for short-term forecasting and behavior
adaptation/optimization.
• WP8: Data Handling and Visualization (ZIB)
– will provide Grid aware techniques for data management,
analysis, and visualization, needed especially for applications that
make use of multiple sites in a dynamic, time dependent manner,
leaving data unpredictably scattered across the Grid.
• WP9: Resource Management (PSNC)
– will develop resource need estimators, resource brokers, and
other tools, for both Grid users and the applications themselves to
make intelligent decisions about which Grid resources should be
used at any instant in the lifetime of a simulation.
23
Workpackages (4)
• WP10: Information Services (ISUFI)
– will extend existing Grid middleware toolkits with dynamic
features needed by applications to select appropriate Grid
resources and to provide simulation information to
collaborative user groups.
• WP11: Monitoring (SZTAKI)
– will develop new components that will fit in the general Grid
monitoring architecture to support application steering,
adaptive monitoring, and automatic analysis and prediction
of performance data.
• WP12: Access for mobile users (ZIB)
– will develop and test Grid access and monitoring
technologies through a variety of mobile devices,
24
Workpackages (5)
• WP13: Information Dissemination and Exploitation
(PSNC)
– will ensure the active dissemination of the project results
through a variety of channels, including active participation
in international organizations (e.g. GGF), co-development
with other Grid projects in the USA and EU, participation in
international conferences, training programs, instruction of
GridLab technologies into various communities, and
introduction into the commercial vendor world.
• WP14: Project Management (PSNC)
– day-to-day scientific, financial and administrative
management of the project, including careful orchestration
and monitoring of work across groups, major project
decisions, liaisons with external projects and with the
international advisory board, reporting
25
International Advisory Board
• Domenico Laforenza, CNUCE,
• Thierry Priol, INRIA
• Lennart Johnsson, Director of the Texas
Learning and Computation Center,
University of Huston, strong European
relations (PDC)
• Daniel Reed, Director of the NCSA,
• Paul Avery, PI of GriPhyN project,
26
More info
• www.gridlab.org
• www.gridforum.org
• http://www.zib.de/ggf/apps
27
Download