Introduction to the TeraGrid Daniel S. Katz d.katz@ieee.org Lead, LONI as a TeraGrid Resource Provider Director of Science, TeraGrid GIG Director, Cyberinfrastructure Development, Center for Computation & Technology, LSU Adjunct Associate Professor, Electrical and Computer Engineering Department, LSU CCT: Center for Computation TechnologySTATE UNIVERSITY AT & LOUISIANA What is the TeraGrid? • World’s largest open scientific discovery infrastructure • Leadership class resources at eleven partner sites combined to create an integrated, persistent computational resource – High-performance networks – High-performance computers (>1 Pflops (~100,000 cores) -> 1.75 Pflops) • And a Condor pool (w/ ~13,000 CPUs) – – – – – Visualization systems Data Collections (>30 PB, >100 discipline-specific databases) Science Gateways User portal User services - Help desk, training, advanced app support • Allocated to US researchers and their collaborators through national peer-review process – Generally, review of computing, not science • Extremely user-driven – MPI jobs, ssh or grid (GRAM) access, etc. CCT: Center for Computation & Technology 2 Governance • 11 Resource Providers (RPs) funded under separate agreements with NSF • • • • Different start and end dates Different goals Different agreements Different funding models • 1 Coordinating Body – Grid Integration Group (GIG) • University of Chicago/Argonne • Subcontracts to all RPs and six other universities • 7-8 Area Directors • Working groups with members from many RPs • TeraGrid Forum with Chair CCT: Center for Computation & Technology 3 TeraGrid CCT: Center for Computation & Technology 4 TG New Large Resources • Ranger@TACC – First NSF ‘Track2’ HPC system – 504 TF – 15,744 Quad-Core AMD Opteron processors – 123 TB memory, 1.7 PB disk • Kraken@NICS (UT/ORNL) – Second NSF ‘Track2’ HPC system – 170 TF Cray XT4 system – To be upgraded to Cray XT5 at 1 PF • 10,000+ compute sockets • 100 TB memory, 2.3 PB disk Blue Waters@NCSA NSF Track 1 10 PF peak Coming in 2011 • Something@PSC – Third NSF ‘Track2’ HPC system CCT: Center for Computation & Technology 5 Who Uses TeraGrid (2007) Advanced Scientific All 19 Others Atmospheric Computing 4% Sciences 2% 3% Earth Sciences 3% Chemical, Thermal Systems 5% Materials Research 6% Molecular Biosciences 31% Astronomical Sciences 12% Physics 17% Chemistry 17% CCT: Center for Computation & Technology 6 How TeraGrid Is Used Use Modality Community Size (rough est. - number of users) Batch Computing on Individual Resources Exploratory and Application Porting Workflow, Ensemble, and Parameter Sweep Science Gateway Access Remote Interactive Steering and Visualization Tightly-Coupled Distributed Computation 850 650 250 500 35 10 2006 data CCT: Center for Computation & Technology 7 How One Uses TeraGrid RP 1 RP 2 POPS (for now) User Portal Science Gateways TeraGrid Infrastructure Accounting, … (Accounting, Network,Network, Authorization,…) Command Line RP 3 Slide courtesy of Dane Skow and Craig Stewart Compute Service Viz Service CCT: Center for Computation & Technology Data Service 8 User Portal: portal.teragrid.org CCT: Center for Computation & Technology 9 Access to resources • Terminal: ssh, gsissh • Portal: TeraGrid user portal, Gateways – Once logged in to portal, click on “Login” • Also, SSO from command-line CCT: Center for Computation & Technology 10 User Portal: User Information • Knowledge Base for quick answers to technical questions • Documentation • Science Highlights • News and press releases • Education, outreach and training events and resources CCT: Center for Computation & Technology 11 Data Storage Resources • GPFS-WAN – 700 TB disk storage at SDSC, accessible from machines at NCAR, NCSA, SDSC, ANL • Data Capacitor – 535 TB storage at IU, including databases • Data Collections – Storage at SDSC (files, databases) for collections used by communities • Tape Storage – Available at IU, NCAR, NCSA, SDSC • Access is generally through GridFTP (through portal or command-line) CCT: Center for Computation & Technology 12 Science Gateways • A natural extension of Internet & Web 2.0 • Idea resonates with Scientists – Researchers can imagine scientific capabilities provided through familiar interface • Mostly web portal or web or client-server program • Designed by communities; provide interfaces understood by those communities – Also provide access to greater capabilities (back end) – Without user understand details of capabilities – Scientists know they can undertake more complex analyses and that’s all they want to focus on – TeraGrid provides tools to help developer • Seamless access doesn’t come for free – Hinges on very capable developer Slide courtesy of Nancy Wilkins-Diehr CCT: Center for Computation & Technology 13 Current Science Gateways • • • • • • • Biology and Biomedicine Science Gateway Open Life Sciences Gateway The Telescience Project Grid Analysis Environment (GAE) Neutron Science Instrument Gateway TeraGrid Visualization Gateway, ANL BIRN • • • • • • • • • • • • • • Open Science Grid (OSG) Special PRiority and Urgent Computing Environment (SPRUCE) National Virtual Observatory (NVO) Linked Environments for Atmospheric Discovery (LEAD) Computational Chemistry Grid (GridChem) Computational Science and Engineering Online (CSE-Online) GEON(GEOsciences Network) Network for Earthquake Engineering Simulation (NEES) SCEC Earthworks Project Network for Computational Nanotechnology and nanoHUB GIScience Gateway (GISolve) Gridblast Bioinformatics Gateway Earth Systems Grid Astrophysical Data Repository (Cornell) Slide courtesy of Nancy Wilkins-Diehr CCT: Center for Computation & Technology 14 Focused Outreach • Campus Champions – Volunteer effort of a staff member at a university • Pathways for Broadening Participation – Low-level trial effort – TeraGrid RP staff have extended interaction with MSIs • Both – Have initial small TG allocation • Can create and distribute suballocations very quickly – Work with users as needed to help them CCT: Center for Computation & Technology 15 TG App: Predicting storms • Hurricanes and tornadoes cause massive loss of life and damage to property • TeraGrid supported spring 2007 NOAA and University of Oklahoma Hazardous Weather Testbed – Major Goal: assess how well ensemble forecasting predicts thunderstorms, including the supercells that spawn tornadoes – Nightly reservation at PSC, spawning jobs at NCSA as needed for details – Input, output, and intermediate data transfers – Delivers “better than real time” prediction – Used 675,000 CPU hours for the season – Used 312 TB on HPSS storage at PSC Slide courtesy of Dennis Gannon, ex-IU, and LEAD Collaboration CCT: Center for Computation & Technology 16 App: GridChem Different licensed applications with different queues Will be scheduled for workflows Slide courtesy of Joohyun Kim CCT: Center for Computation & Technology 17 Apps: Genius and Materials Fully-atomistic simulations of clay-polymer nanocomposites Modeling blood flow before (during?) surgery Why cross-site / distributed runs? HemeLB on LONI 1.Rapid turnaround, conglomeration of idle processors to run a single large job LAMMPS on TeraGrid 2.Run big compute & big memory jobs not possible on a single machine Slide courtesy of Steven Manos and Peter Coveney CCT: Center for Computation & Technology 18 TeraGrid: Both Operations and Research • Operations – Facilities/services on which users rely – Infrastructure on which other providers build AND • R&D – Learning how to do distributed, collaborative science on a global, federated infrastructure – Learning how to run multi-institution shared infrastructure Looking for science that needs TG and NGS/DEISA/UK CCT: Center for Computation & Technology 19