GridLab Grid Application Toolkit and Testbed Contact: Jarek Nabrzyski, GridLab Project Coordinator naber@man.poznan.pl Poznań Supercomputing and Networking Center, Noskowskiego 12/14 61-704 Poznań, POLAND http://www.gridlab.org 1 GridLab – Grid Application Toolkit and Testbed • User and Grid Application Developer oriented • Budget: 6MEuro (5086k funded by the EC) • Mixture of grid users, application developers, grid developers and vendors 2 GridLab participants • • • • • • • • • • • Poznań Supercomputing and Networking Center, Poland (PSNC) – Project Coordinator, Albert Einstein Institute , Germany, SZTAKI, Hungary Masaryk University, Czech Republik, University of Lecce, Italy Konrad Zusse Centrum (ZIB), Germany Vrije University, Netherlands, Sun Microsystems Gridware GmbH, Germany Compaq EMEA, France University of Athens, Greece In co-operation with – Argonne National Laboratory (Ian Foster’s group) – ISI (Carl Kesselman’s group) – University of Wisconsin (Miron Livny’s group) • • 2 subcontracting sites provide additional testbed resources Many other partners will work unfunded (GGF Apps WG connections) 3 GridLab user community • Gravitational wave detection and analysis • Numerical relativity (black hole collisions) • Other simulation-based users 4 History, origins EGrid Testbed Cactus worm Dynamic grid computing 12 sites Globus Presented at SC2000 5 Cactus Worm Illustration of basic scenario • Cactus simulation (could be anything) launched from portal • Queries a Grid Information Server, finds available resources • Migrates itself to next site, according to some criterion • Registers new location to GIS, terminates old simulation • User tracks/steers, using http, streaming data, etc...… If we can do this, much of what we want can be done! • Need to work closely with Grid Infrastructure developers to do this! 6 GridLab Aims • Get Computational Scientists using the “Grid” and Grid services for real, everyday, production work (AEI Relativists, EU Network, Grav Wave Data Analysis, Cactus User Community). • Make it easier for applications to make flexible, efficient, robust, use of the resources available to their virtual organizations. • Dream up, prototype, and test new application scenarios which make adaptive, dynamic, wild, and futuristic uses of resources. 7 GridLab user requirements • Large scale simulations too big to fit on any current supercomputer, • Friendly code composition tools to build the parameter files, • Performance prediction tools, • Dynamic brokering services, • Scheduling and data management, • Dynamic grid monitoring, • Remote access tools to visualize data, monitor performance and simulation properties, interactively steer the simulation 8 What GridLab Isn’t … • Don’t want to develop low level Grid Infrastructure (just want to nudge it) • Don’t want to repeat work which has already been done (want to incorporate and assimilate it … Globus APIs, ASC Portal (GridSphere/Orbiter), GPDK, GridPort, DataGrid, …) 9 Solution “Grid Application Toolkit” Provides a layer between applications and emerging grid technologies. Provides an application developer orientated API, allowing the flexible use of different tools and services, as well as providing protection from developing software. “GridLab Testbed/VO” Continuous Dialogue End Users GAT Tool Developers GAT-API Developers Grid Infrastructure Developers Diverse controllable environment for developing and testing applications and tools, software maintained by people who know it. 10 GridLab scenario (1) • Routine realtime analysis of gravitational wave data from the Hannover detector identifies a burst event, but this standard analysis reveals no information about the burst location. • To obtain the location, desperately required by astrophysicists for turning their telescopes to view the event before it fades, a large series of templates must be cross-correlated against the detector data. • A German astrophysicist accesses the GEO600 Portal, and using the performance tool finds that 3 TFlops/s is needed to analyze the 100GB of raw data in the required hour. 11 GridLab scenario (2) • Local resources are insuficient, so using the brokering tool, she locates the fastest available machines around the world. • Broker selects five suitable machines, and with scheduling and data management tools, data is moved, executables created and the analysis starts. • In an Cracow bar, twenty minutes later, an SMS message from the portal's notification tool, informs her that one machine is overloaded, breaking the runtime contract. • She connects with her PDA to the portal, and instructs the migration tool to move this part of the analysis to a different machine. • Within the specified hour, a second SMS message tells her that analysis is finished, and the resulting data is now on her local machine. Using this location data, observatories are able to find and view an exceptionally strong gamma-ray burst, characteristic of a collision of neutron stars. 12 Issues and key objectives • Co-development of Infrastructure and Applications – Application driven grid technologies, – Easy and efficient use of Grid resources in a real user environment, • Dynamic Grid Computing – Application awareness of the changing grid environment • • • • • Investigate various user scenarios, Design and develop a Grid Application Toolkit (GAT), Simultaneously enhance real applications for the Grid, Test the Grid-enabled applications on real test beds, Design and develop user application portals 13 Grid Application Toolkit • Application developer should be able to build simulations with tools that easily enable dynamic grid capabilities • Want to build programming API to easily allow: – Query information server (e.g. GIIS) • What’s available for me? What software? How many processors? – Network Monitoring – Decision Routines (Thorns) • How to decide? Cost? Reliability? Size? – Spawning Routines (Thorns) • Now start this up over here, and that up over there – Authentication Server • Issues commands, moves files on your behalf Data Transfer • Use whatever method is desired (Gsi-ssh, Gsi-ftp, Streamed HDF5, scp…) – Etc… 14 GridLab: New Paradigms for Dynamic Grids • Code should be aware of its environment – What resources are out there NOW, and what is their current state? – What is my allocation? – What is the bandwidth/latency between sites? • Code should be able to make decisions on its own – A slow part of my simulation can run asynchronously…spawn it off! – New, more powerful resources just became available…migrate there! – Machine went down…reconfigure and recover! – Need more memory…get it by adding more machines! • Code should be able to publish this information to central server for tracking, monitoring, steering… – Unexpected event…notify users! – Collaborators from around the world all connect, examine simulation. 15 Dynamic Grid Computing Queue time over, find new machine Free CPUs!! Site C Site B Archive data Clone job with steered parameter Add more resources Site D Calculate/Output Invariants Found a horizon, try out excision Look for horizon Calculate/Output Grav. Waves Find best resources Go! Site A 16 Advanced Portal Computing A Portal to Computational Science Cactus Computational Toolkit 1. User has science idea... 2. Composes/Builds Code Components w/Interface... Science, Autopilot, AMR, Petsc, HDF, MPI, GrACE, Globus, Remote Steering... 4. Steers simulation, monitors performance... 5. Collaborators log in to monitor... There are a lot of generic users that need this technology 3. Selects Appropriate Resources.. Remote Viz and Steering Must be able to watch/control any simulation live… HTTP Any Viz Client: LCA Vision, OpenDX Remote Viz data Changing any steerable parameter HDF5 • • • • Parameters Physics, algorithms Performance User Preferences Remote Viz data Amira 18 User’s View ... simple! 19 GirdLab Architecture 20 Workpackages (1) • WP1: Grid Application Toolkit (AEI) – This is a key component of GridLab - link between Grid middleware and applications, usable by any conforming application or middleware component. Requiring input from, and connecting to, most other workpackages and components. • WP2: Cactus Grid Application Toolkit (AEI) – provides an extended GAT interface for Cactus, a very general toolkit framework supporting different Grid applications, from astrophysics to chemical engineering. Cactus will be one of the primary application drivers for the GAT, and the project generally. • WP3: Work-flow Application Toolkit (CARDIFF) – Will develop Grid capabilities for a widely used dataflow programming environment, Triana, used in gravitational 21 Workpackages (2) • WP4: Grid Portals (AEI) – will be highly application driven, aimed at providing uniform, flexible and intuitive user access to Grid resources from anywhere, as well as administration tools for maintaining a Grid environment. • WP5: Testbed management (MU) – will administrate and maintain an active development testbed across roughly a dozen EU sites (leveraging the work of the EGrid), deploying technologies as they are developed by the project. This workpackage will also coordinate with sites in the USA-based NCSA Alliance and others to test and develop interoperability. • WP6: Security (PSNC) – will develop the required security mechanisms and will ensure the integration of all the technologies developed under other WPs, taking into account the various local security requirements and state of the art solutions. 22 Workpackages (3) • WP7: Adaptive Application Components (VU) – develops a set of components and APIs to be plugged into the toolkit, for example to take monitoring information and implement basic techniques for short-term forecasting and behavior adaptation/optimization. • WP8: Data Handling and Visualization (ZIB) – will provide Grid aware techniques for data management, analysis, and visualization, needed especially for applications that make use of multiple sites in a dynamic, time dependent manner, leaving data unpredictably scattered across the Grid. • WP9: Resource Management (PSNC) – will develop resource need estimators, resource brokers, and other tools, for both Grid users and the applications themselves to make intelligent decisions about which Grid resources should be used at any instant in the lifetime of a simulation. 23 Workpackages (4) • WP10: Information Services (ISUFI) – will extend existing Grid middleware toolkits with dynamic features needed by applications to select appropriate Grid resources and to provide simulation information to collaborative user groups. • WP11: Monitoring (SZTAKI) – will develop new components that will fit in the general Grid monitoring architecture to support application steering, adaptive monitoring, and automatic analysis and prediction of performance data. • WP12: Access for mobile users (ZIB) – will develop and test Grid access and monitoring technologies through a variety of mobile devices, 24 Workpackages (5) • WP13: Information Dissemination and Exploitation (PSNC) – will ensure the active dissemination of the project results through a variety of channels, including active participation in international organizations (e.g. GGF), co-development with other Grid projects in the USA and EU, participation in international conferences, training programs, instruction of GridLab technologies into various communities, and introduction into the commercial vendor world. • WP14: Project Management (PSNC) – day-to-day scientific, financial and administrative management of the project, including careful orchestration and monitoring of work across groups, major project decisions, liaisons with external projects and with the international advisory board, reporting 25 International Advisory Board • Domenico Laforenza, CNUCE, • Thierry Priol, INRIA • Lennart Johnsson, Director of the Texas Learning and Computation Center, University of Huston, strong European relations (PDC) • Daniel Reed, Director of the NCSA, • Paul Avery, PI of GriPhyN project, 26 More info • www.gridlab.org • www.gridforum.org • http://www.zib.de/ggf/apps 27