Supercomputing Systems at NCAR SC2005 Marc Genty Supercomputing Services Group November 15 - 17, 2005 November 15, 2005 1 NCAR Mission The National Center for Atmospheric Research (NCAR) is a federally funded research and development center. Together with our partners at universities and research centers, we are dedicated to exploring and understanding our atmosphere and its interactions with the Sun, the oceans, the biosphere, and human society. NCAR consists of the following: •Computational and Information Systems Laboratory (CISL) SCD IMAGe Scientific Computing Division Institute for Mathematics Applied to Geosciences •Earth and Sun Systems Laboratory (ESSL) ACD Atmospheric Chemistry Division CGD Climate & Global Dynamics Division HAO High Altitude Observatory MMM Mesoscale & Microscale Meterology Division TIMES The Institute for Multidisciplinary Earth Studies •Earth Observing Laboratory (EOL) HIAPER High-Performance Instrumented Airborne Platform for Environmental Research •Research Applications Laboratory (RAL) RAP Research Applications Programs •Societal-Environmental Research and Education Laboratory (SERE) ASP Advanced Study Program CCB Center for Capacity Building ISSE Institute for the Study of Society and Environment (formerly ESIG) November 15, 2005 2 NCAR Science Space Weather Climate Turbulence Weather The Sun Atmospheric Chemistry More than just the atmosphere… from the earth’s oceans to the solar interior November 15, 2005 3 2005: Climate Simulation Lab Science • Community Climate System Model (CCSM) • Modeling Climate Change and Climate Variability in Coupled Climate-Land Vegetation Models: Present, Past, and Future Climates • 50-year Regional Downscaling of NCEP/NCAR Reanalysis Over California Using the Regional Spectral Model • Climate Variability in the Atlantic Basin • Aerosol Effects on the Hydrological Cycle • Pacific Decadal Variability due to Tropical-Extratropical Interaction • Predictability of the Coupled Ocean-Atmosphere-Land Climate System: Seasonal-to-Interannual Time Scales • The Whole Atmosphere Community Climate Model • Decadal to Century Coupled Ocean/Ice Simulations at High Resolution (0.2) Using an Innovative Geodesic Grid • Ocean State Estimation • Devlopment and Application of Seasonal Climate Predictions • => http://www.scd.ucar.edu/csl/cslannual0505.html November 15, 2005 4 69 Member Universities University of Alabama in Huntsville University of Illinois at Urbana-Champaign Princeton University University of Alaska University of Iowa Purdue University University at Albany, State University of New York Iowa State University University of Rhode Island The Johns Hopkins University Rice University University of Arizona University of Maryland Rutgers University Arizona State University Massachusetts Institute of Technology Saint Louis University California Institute of Technology McGill University University of California, Berkeley University of Miami Scripps Institution of Oceanography at Stanford University UCSD University of California, Davis University of Michigan - Ann Arbor Texas A&M University University of California, Irvine University of Minnesota University of Texas at Austin University of California, Los Angeles University of Missouri Texas Tech University University of Chicago Naval Postgraduate School University of Toronto Colorado State University University of Nebraska, Lincoln Utah State University University of Colorado at Boulder Nevada System of Higher Education University of Utah Columbia University University of New Hampshire, Durham University of Virginia Cornell University University of Washington University of Denver New Mexico Institute of Mining and Technology Drexel University New York University University of Wisconsin - Madison Florida State University North Carolina State University University of Wisconsin - Milwaukee Georgia Institute of Technology The Ohio State University Woods Hole Oceanographic Institution Harvard University University of Oklahoma University of Wyoming University of Hawaii Old Dominion University Yale University University of Houston Oregon State University York University Howard University Pennsylvania State University http://www.ucar.edu/governance/ members/institutions.shtml November 15, 2005 Washington State University 5 SCD Mission The Scientific Computing Division (SCD) is part of the National Center for Atmospheric Research (NCAR) in Boulder, Colorado. The goal of SCD is to enable the best atmospheric research in the world by providing and advancing high-performance computing technologies. SCD offers computing, research datasets, data storage, networking, and data analysis tools to advance the scientific research agenda of NCAR. NCAR is managed by the University Corporation for Atmospheric Research (UCAR) and is sponsored by the National Science Foundation. November 15, 2005 6 The NCAR Mesa Lab November 15, 2005 7 History of Supercomputing at NCAR Production Machines Non-Production Machines Currently in the NCAR/SCD computing facility IBM p5-575/624 bluevista Aspen Nocona/InfiniBand coral IBM BlueGene/L frost IBM e1350/140 pegasus IBM e1350/264 lightning IBM p690-C/1600 bluesky IBM p690-F/64 thunder IBM p690-C/1216 bluesky SGI Origin 3800/128 tempest IBM p690-C/16 bluedawn IBM SP WH2/1308 blackforest IBM SP WH2/604 blackforest IBM SP WH2/64 babyblue Compaq ES40/36 prospect IBM SP WH1/296 blackforest IBM SP WH1/32 babyblue Beowulf/16 tevye SGI Origin2000/128 ute Cray J90se/24 chipeta HP SPP-2000/64 sioux Cray T3D/128 Cray C90/16 antero Cray J90se/24 ouray Cray J90/20 aztec Cray J90/16 paiute Cray T3D/64 Cray Y-MP/8I antero CCC Cray 3/4 graywolf IBM SP1/8 eaglesnest TMC CM5/32 littlebear IBM RS/6000 Cluster Cray Y-MP/2 castle Cray Y-MP/8 shavano TMC CM2/8192 capitol Cray X-MP/4 Cray 1-A S/N 14 Cray 1-A S/N 3 CDC 7600 CDC 6600 R e v is e d N o v '0 5 CDC 3600 1960 1965 November 15, 2005 1970 1975 1980 1985 1990 1995 2000 2005 8 In the beginning… (1963) November 15, 2005 9 CDC 3600 System Overview • Circuitry Design: Seymour Cray • Clock Speed: 0.7 MHz • Memory: 32 Kbytes • Peak Performance: 1.3 MFLOPs November 15, 2005 10 Today’s NCAR Supercomputers • Bluesky [IBM POWER4 AIX - Production - General Scientific Use] – 125-node (50 frame) p690 cluster, 1600 1.3GHz CPUs, SP Switch2, 15TB FC disk – Configured as 76 8-way (LPAR) nodes and 25 32-way (SMP) nodes. • Bluevista [IBM POWER5 AIX - Production - General Scientific Use] – 78-node p575 cluster, 624 1.9GHz CPUs, HPS Switch, 55TB FC disk – NCAR codes are typically seeing a speedup of 2x-3x over bluesky • Frost [IBM Blue Gene/L - Single Rack - Pset Size = 32] • Lightning [IBM SuSE Linux - Production - General Scientific Use] – 132-node AMD64/Xeon cluster, 264 2.2/3.0GHz CPUs, Myrinet Switch, ~6TB SATA disk • Pegasus [IBM SuSE Linux - Production - Real-Time Weather Forecasting] – 70-node AMD64/Xeon cluster, 140 2.2/3.0GHz CPUs, Myrinet Switch, ~6TB SATA disk • Coral [Aspen Systems SuSE Linux - Production - IMAGe Divisional System] – 24-node Nacona cluster, 44 3.2GHz CPUs, InifinBand Switch, ~6TB SATA disk • Test Systems: Thunder [P4/HPS], Bluedawn [P4/SP Switch2], Otis [P5/HPS] November 15, 2005 11 Bluesky November 15, 2005 12 Bluesky System Overview • IBM POWER4 Cluster 1600 • AIX 5.1, PSSP, GPFS, LoadLeveler • 125-node (50 frame) p690 cluster • Compute Node Breakdown: 76 8-way (LPAR) & 25 32-way (SMP) • 1600 1.3GHz CPUs • SP Switch2 (Colony) • 15TB FC disk • General purpose, computational resource November 15, 2005 13 Bluesky 32-Way LPAR Usage bluesky 32-way LPAR Usage Utilization % User % Idle % System 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% November 15, 2005 10/23 10/9 9/25 9/11 8/28 8/14 7/31 7/17 7/3 6/19 6/5 5/22 5/8 4/24 4/10 3/27 3/13 2/27 2/13 1/30 1/16 1/2 12/19 12/5 11/21 11/7 10/24 10/10 9/26 9/12 8/29 0% 14 Bluesky 8-Way LPAR Usage bluesky 8-way LPAR Usage Utilization % User % Idle % System 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% November 15, 2005 10/23 10/9 9/25 9/11 8/28 8/14 7/31 7/17 7/3 6/19 6/5 5/22 5/8 4/24 4/10 3/27 3/13 2/27 2/13 1/30 1/16 1/2 12/19 12/5 11/21 11/7 10/24 10/10 9/26 9/12 8/29 0% 15 Bluesky Science Highlights • CCSM3: The Community Climate System Model – Fully-coupled, global climate model that provides state-of-the-art computer simulations of the Earth's past, present, and future climate states – The CCSM3 IPCC (Intergovernmental Plan on Climate Change) integrations now include roughly 11,000 years of simulated climate (19th - 24th centuries) – The CCSM3 control run archive contains 4,500 years of simulated climate at three resolutions – http://www.ccsm.ucar.edu/ November 15, 2005 16 Bluesky Science Highlights • ARW: Advanced Research WRF (Weather Research & Forecasting) Model – Next-Generation Mesoscale Numerical Weather Prediction System – http://www.mmm.ucar.edu/index.php November 15, 2005 17 Bluevista November 15, 2005 18 Bluevista System Overview • IBM POWER5 Cluster • AIX 5.2, CSM, GPFS, LSF • 78-node p575 cluster • 624 1.9GHz CPUs • HPS Switch (Federation) • 55TB FC disk • General purpose, computational resource • NCAR codes are typically seeing a speedup of 2x-3x over bluesky • The bluevista cluster is estimated to have the same sustained computing capacity as the bluesky cluster November 15, 2005 19 Bluevista Usage (Not Yet In Full Production) bluevista Usage Utilization % User % Idle % System 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% November 15, 2005 10/20 10/13 10/6 0% 20 Bluevista Science Highlights • 2005: Nested Regional Climate Model (NRCM) – Focus: To develop a state-of-the-science nested climate model based in WRF and to provide this to the community – http://www.mmm.ucar.edu/facilities/nrcm/nrcm.php • 2005: Limited friendly-user time also allocated • 2006: General scientific production system – Will augment bluesky capacity November 15, 2005 21 Frost November 15, 2005 22 Frost System Overview • IBM Blue Gene/L • Single rack – One I/O node per thirty-two compute nodes (pset size = 32) • Service node – One IBM p630 server • Two POWER4+ 1.2GHz CPUs & 4GB memory – SuSE (SLES9), DB2 FixPak9 • Front-end nodes – Four IBM OpenPower 720 servers • Four POWER5 1.65GHz CPUs & 8GB memory – SuSE (SLES9), GPFS, COBALT November 15, 2005 23 Blue Gene/L At NCAR Blue Gene/L is jointly owned and managed collaboratively by NCAR and the University of Colorado (Boulder & Denver). There are Principal Investigators (PIs) associated with each research facility, and each PI has a small group of scientists running on the system. Blue Gene/L is a targeted system at this time with allocations split among the three primary research facilities. November 15, 2005 24 Bluesky / Frost Side-By-Side Processors: Peak Teraflops: Linpack: Power (kW): 1600 8.3 4.2 400* Processors: Peak Teraflops: Linpack: Power (kW): 2048 5.73 4.6 25* *The average personal computer consumes about 0.12 kW November 15, 2005 25 Frost Usage November 15, 2005 26 Frost Principal Research Areas • Climate and Weather Simulation – http://www.ucar.edu/research/climate/ – http://www.ucar.edu/research/prediction/ • Computational Fluid Dynamics and Turbulence – http://www.image.ucar.edu/TNT/ • Coupled Atmosphere-Fire Modeling – http://www.ucar.edu/research/climate/drought.shtml • Scalable Solvers – http://amath.colorado.edu/faculty/tmanteuf/ • Aerospace Engineering – http://icme.stanford.edu/faculty/cfarhat.html November 15, 2005 27 Frost Science Highlights “Modeling Aqua Planet on Blue Gene/L” Dr. Amik St-Cyr - Scientist Computational Science Section Scientific Computing Division • NCAR Booth: 1:00PM - Tuesday, November 15, 2005 • NCAR Booth: 1:00PM - Thursday, November 16, 2005 November 15, 2005 28 Lightning November 15, 2005 29 Lightning System Overview • IBM Cluster 1350 • SuSE (SLES9) Linux, CSM, GPFS, LSF • 132-node AMD64/Xeon cluster • 264 2.2/3.0GHz CPUs • Myrinet Switch • ~6TB SATA disk • General purpose, computational resource November 15, 2005 30 November 15, 2005 Utilization % User % Idle % System 10/17/05 10/3/05 9/19/05 9/5/05 8/22/05 8/8/05 7/25/05 7/11/05 6/27/05 lightning Usage 6/13/05 5/30/05 5/16/05 5/2/05 4/18/05 4/4/05 3/21/05 3/7/05 2/21/05 2/7/05 1/24/05 1/10/05 12/27/04 12/13/04 11/29/04 11/15/04 11/1/04 Lightning Usage 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 31 Lightning Science Highlights • TCSP: Tropical Cloud Systems and Processes field research investigation – Joint NCAR/NASA/NOAA study of the dynamics and thermodynamics of precipitating cloud systems, including tropical cyclones – http://box.mmm.ucar.edu/projects/wrf_tcsp/ November 15, 2005 32 Lightning Science Highlights • TIME-GCM: Thermosphere Ionosphere Mesosphere Electrodynamics General Circulation Model – Distributed memory parallelism using eight nodes (16 MPI tasks) – Run completed in 12 jobs at 5-6 wallclock hours each for a total of 70 hours (~12 minutes per simulated day) – http://www.hao.ucar.edu/Public/models/models.html November 15, 2005 33 Pegasus November 15, 2005 34 Pegasus System Overview • IBM Cluster 1350 • SuSE (SLES9) Linux, CSM, GPFS, LSF • 70-node AMD64/Xeon cluster • 140 2.2/3.0GHz CPUs • Myrinet Switch • ~6TB SATA disk • Essentially a 0.5 scale model of the lightning cluster • Real-Time Weather Forecasting November 15, 2005 35 Pegasus Usage pegasus Usage Utilization % User % Idle % System 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% November 15, 2005 10/1/05 9/1/05 8/1/05 7/1/05 6/1/05 5/1/05 4/1/05 0% 36 Pegasus Science Highlights • AMPS: Antarctic Mesoscale Prediction System (Polar MM5) – Twice daily operational forecasts for the Antarctic Region (McMurdo Station, Antarctica) – Sponsored by the NSF Office of Polar Programs – http://box.mmm.ucar.edu/rt/mm5/amps/ November 15, 2005 37 Coral November 15, 2005 38 Coral System Overview • Aspen Systems 24-node Nacona cluster (48 3.2/3.6GHz CPUs) • SuSE (SLES9) Linux, ABC, NFS, LSF • Two HP Visualization Nodes (RedHat Enterprise Linux V3) • InifiniBand Switch • ~6TB SATA disk • Dedicated resource belonging to the Institute for Mathematics Applied to the Geosciences (IMAGe) Division – http://www.image.ucar.edu/ November 15, 2005 39 Tempest • SGI Origin 3800 • IRIX 6.5.25, NQE – No cluster mgt s/w or parallel file system • 128 500-MHz R14000 CPUs • 64GB Distributed Shared Memory • NUMAlink Interconnect • ~8.5TB Ciprico SATA RAID disk • General purpose, post-processing and data analysis server • Managed by the Data Analysis Services Group (DASG) November 15, 2005 40 Tempest Science Highlights “Desktop Techniques for the Exploration of Terascale-sized Turbulence Data Sets” John Clyne - Senior Software Engineer High-Performance Systems Section • NCAR Booth: 4:00PM - Tuesday, November 15, 2005 • NCAR Booth: 3:00PM - Thursday, November 17, 2005 November 15, 2005 41 Blackforest (318 WHII Node RS/6000 SP) R.I.P. - 12Jan05 @ 8am November 15, 2005 42 Blackforest Highlights • 5.4 Year Lifetime • 30.5 Million CPU Hours Of Work • 600,000 Batch Jobs • 50 CPU Hours/Job (On Average) • 27.28 CPUs (7 Nodes) - Average Job November 15, 2005 43 NCAR Supercomputer Performance Numbers System Name Peak TFLOPs Memory (Tbytes) 4.742 8.320 1.162 0.128 1.248 3.328 0.544 0.064 Est'd Power measured Est'd Sustained Disk Consumption Code or Sustained MFLOPs/ (Tbytes) (kWatts) Efficiency estimated GFLOPs Watt Production Systems bluevista bluesky lightning tempest 55.0 28.5 7.8 7.9 276 385 48 50.0 7.8% 4.3% 5.8% 9.8% est. meas. est. est. 369.9 355.3 67.4 12.5 1.34 0.92 1.40 0.25 6.8% 5.8% 5.0% 5.1% 4.3% est. est. est. est. est. 389.9 35.7 12.8 17.1 3.0 15.5 1.28 1.42 1.10 0.46 1263.61 23.65 Research Systems, Divisional Systems & Test Systems frost (BG/L) pegasus (AMPS) coral (IMAGe) thunder bluedawn 5.734 0.616 0.256 0.333 0.070 0.524 0.288 0.088 0.128 0.032 6.6 5.6 6.4 1.2 0.7 25.2 28 9 15.5 6.5 TOTAL 21.36 6.24 119.63 843.20 November 15, 2005 44 Conclusion & Questions • Read more about it: – http://www.scd.ucar.edu/main/computers.html • Questions or Comments? • Special thanks to: – – – – – – November 15, 2005 Lynda Lester / Pam Gillman (SCD): Tom Engel (SCD): Irfan Elahi / John Clyne (SCD): Ben Foster (HAO): Sean McCreary (CSS/CU): BJ Heller (SCD): Photographs Utilization Charts / Stats Fun Facts TIME-GCM Data BG/L Research Areas Production Assistance 45