Gabrielle Allen gallen@cct.lsu.edu
Cactus Retreat
Baton Rouge, April 2004
User Support
Applications
Research and Development
Support for more numerical models
(Saturday)
Large scale computing
Frameworks
Visualization
Data models and formats
Grid computing
Adaptive Mesh Refinement
PAGH (Wash U)
Carpet (Albert Einstein Institute)
Application Performance Modeling
University of Chicago
Data Formats and Visualization
Lawrence Berkeley Lab
Konrad Zuse Zentrum
Albert Einstein Institute
Optimization and Performance
NCSA, Intel, Cray, Lawrence Berkeley Lab, Absoft
User GUIs
Wash U
Pending proposal to NSF ITR program
“Hypercode: Interoperable Infrastructure Initiative”
Addressing interoperability for general computational infrastructures,
Cactus, Chombo, Paramesh and CCA-based frameworks
Focused around applications:
Numerical Relativity
Computational Astrophysics
Coastal Modeling
Climate Modeling
Cosmology
Computational Fluid Dynamics
Develop common mechanisms and abstractions to enable different simulation codes to use modules and data interchangeably
Partners:
LSU
LBL (John Shalf, Phil Colella, Julian Borill)
U. Maryland (Kevin Olsen, Joan Centrella)
U. Indiana (Denis Gannon)
NCSA (Greg Daues)
Main developments for Cactus:
Incorporate other AMR drivers into Cactus
– Chombo, Paramesh
Incorporate other elliptic solvers into Cactus
Develop a community toolkit for CFD
Add new features to Cactus
– Adaptivity
– Dynamic component loading
Develop common data model
– New visualization tools available
Ongoing visualization projects at AEI, LSU,
LBL with Cactus
Pending DST-NSF proposal with the computer science department and C-DAC to build visualization infrastructure to allow data to be analyzed and visualized on the fly
Additions to Cactus I/O infrastructure
Thorns for visualization
Web-based visualization tools
NSF Software Technologies for High End Computing
(being written)
Incorporate fault tolerant MPI in Cactus driver layer
University of Tennessee
Develop HTTPD thorn into an interactive, real time parallel debugger
Detect and exploit memory and network connection hierarchy from processor cache, through node layout on clusters of SMPs, to cluster interconnections on the
Grid
Performance monitoring and adaption using e.g. PAPI library
“resource sharing and coordinated problem solving in dynamic, multi-institutional virtual organizations”
… infrastructure enabling the integrated, collaborative use of high-end computers, networks, databases, scientific instruments owned and managed by multiple organizations …
… applications often involve large amounts of data and/or computing, secure resource sharing across organizational boundaries, not easily handled by today’s Internet and Web infrastructures …
HTTPD Thorn which allows simulation to any to act as its own web server
Connect to simulation from any browser anywhere … collaborate
Monitor run: parameters, basic visualization, ...
Change steerable parameters
Running example at www.CactusCode.org
Wireless remote viz, monitoring and steering
Collaboration focal point for a virtual organization
Interact, share data
Start jobs on remote resources
Move/browse files
Track and monitor announced jobs
Access to new Grid technologies www.ascportal.org
www.gridsphere.org
“TestBed”
SMS
Server
Portal
Server
Server i t c a
A p p li o n s i n g n n
R u
OpenDX, Amira, …
HDF5
GridFTP
VFD
Stream
VFD
Visualization
Tools
Hyperslabbing,
Downsampling
IOStreamedHDF5
GridFTP
Remote Data Server Simulation
GigE:100MB/sec
17
4 2
12 OC-12 line
(But only 2.5MB/sec)
5
SDSC IBM SP
1024 procs
5x12x17 =1020
NCSA Origin Array
256+128+128
5x12x(4+2+2) =480
These experiments:
Einstein Equations (but could be any
Cactus application)
Achieved:
First runs: 15% scaling
With new techniques: 70-85% scaling, ~ 250GF
5
2
12
Dynamic Adaptation: Number of ghostzones, compression, …
“Gordon Bell Prize”
(Supercomputing 2001, Denver)
Intelligent Parameter Surveys, Monte Carlo
Dynamic Migration: faster/cheaper/bigger machine
Multiple Universe: create clone to investigate steered parameter
Automatic Component Loading (Needs of process change)
Automatic Convergence Testing
Look Ahead
Spawn Independent/Asynchronous Tasks
Routine Profiling: best machine/queue, parameters
Dynamic Load Balancing: inhomogeneous loads, multiple grids
Cactus experiments with grid computing on the
E-Grid
Cactus Worm: thorns which allowed simulations to migrate themselves from machine to machine
Spawning: sending (asynchronous) calculations in analysis thorns to different machines
We wrote the GridLab proposal to be able to do these, and other scenarios, in a better way
January 2002-December 2004):
Many partners in Europe and US
PSNC (Poland), AEI & ZIB (Germany), VU
(Netherlands), MASARYK (Czech), SZTAKI
(Hungary), ISUFI (Italy), Cardiff (UK), NTUA
(Greece), Chicago, ISI & Wisconsin (US),
Sun/Compaq
LSU now a collaborating partner
Developing an easy-to-use, flexible, generic and modular Grid Application Toolkit (GAT), enabling applications to make innovative use of global computing resources
Focused on two principles:
co-development of infrastructure with real applications and user communities (Badly needed in grid computing!!)
dynamic use of grids, with self-aware simulations adapting to their changing environment .
12 Work Packages covering:
Grid Portals
Mobile Users
Different Grid Services
Applications
(Development) Test Bed
Grid Application Toolkit (GAT)
Need a layer between applications and grid infrastructure:
Higher level than existing grid APIs, hide complexity, abstract grid functionality through application oriented APIs
Insulate against rapid evolution of grid infrastructure and state of grid deployment
Choose between different grid infrastructures
Make it possible for application developers to use and develop for the grid independent of the state of deployment of the grid infrastructure
Application
“Is there a better resource I could be using?”
SOAP WSDL Corba OGSA Other
Monitoring
Notification
Security
Data
Management
Profiling
Information
Resource
Management
Application
Manager
Logging
Migration
GLOBUS
Other Grid
Infrastructure?
Application
“Is there a better resource I could be using?”
GAT_FindResource( )
Laptop
Application
GAT
Super Computer
Application
GAT
The Grid
Application
GAT
No network!
Firewall issues!
Standard API and Toolkit for developing portable Grid applications independently of the underlying Grid infrastructure and available services
Implements the GAT-API
Used by applications (different languages)
GAT Adaptors
Connect to capabilities/services
GAT Engine
Provides the function bindings for the GAT-API http://www.gridlab.org/software/GAT
Cactus Flesh
Grid Scenario
Thorn
Grid Scenario
Thorn
Thorn
Thorn
Thorn
Thorn
Thorn
CGAT
Thorn
Cactus GAT wrappers
Additional functionality
Build system
Physics and
Computational
Infrastructure
Modules
GAT
Library
GridLab
Service
GridLab
Service
TFM
TFM TFM implemented in Cactus
GAT used for starting remote TFMs
TFM
TFM TFM Designed for the Grid
Tasks can be anything
Task farm small
Cactus black hole simulations across testbed
Parameter survey: black hole corotation parameter
Results steer a large production black hole simulation
Free CPUs!!
LRZ
Add more resources
Queue time over, find new machine
RZG
Archive data
Found a horizon, try out excision
SDSC
Clone job with steered parameter Calculate/Output
Invariants
S
Brill Wave
SDSC
Find best resources
Look for horizon
S
1
P
1
S
1
S
2
P
2
S
2
Archive to LIGO public database
P
1
P
2
NCSA