Columbia Supercomputer and the NASA Research & Education Network WGISS 19: CONAE March, 2005 David Hartzell NASA Ames / CSC dhartzell@arc.nasa.gov 1 Agenda • Columbia • NREN-NG • Applications 2 NASA’s Columbia System • NASA Ames has embarked on a joint SGI and Intel Linux Super Computer. – Initally twenty 512 processor Intel IA-64 SGI Altix nodes – NREN-NG: an Optical support WAN • NLR will be the optical transport for this network, delivering high-bandwidth to other NASA centers. • Achieved 51.9 Teraflops with all 20 nodes on Nov 2004 • Currently 2nd on the Top500 list – Other systems have come on-line that are now faster. 3 Columbia 4 Preliminary Columbia Uses Space Weather Modeling Framework (SWMF) SWMF has been developed at the University of Michigan under the NASA Earth Science Technology Office (ESTO) Computational Technologies (CT) Project to provide “plug and play” Sun-to-Earth simulation capabilities to the space physics modeling community. Estimating the Circulation and Climate of the Ocean (ECCO) Continued success in ocean modeling has improved model and the work continued during very busy Return To Flight uses of Columbia. finite-volume General Circulation Model (fvGCM) Very promising results from 1/4° fvGCM runs encouraged use for real time weather predictions during hurricane seasons - one goal is to predict hurricanes accurately in advance. Return to Flight (RTF) Simulations of tumbling debris from foam and other sources are being used to assess the threat that shedding such debris poses to various elements of the Space Shuttle Launch Vehicle. 5 20 Nodes in Place • Kalpana was on site at the beginning of the project • The first two new systems were received on 28 June and placed in to service that week. • As of late October, 2004, all systems were in place. 6 Power • Ordered and received twenty 125kw PDU’s • Upgrade / installation of power distribution panels 7 Cooling • New Floor Tiles • Site visits conducted • Plumbing in HSPA and HSPB complete • Heating problem contingency plans developed 8 Networking • Each Columbia node has four 1 GigE • And one 10 GigE • Plus Fiber Channel and Infiniband • Required all new fiber and copper infrastructure, plus switches 9 Components RTF 128 10GigE Front End - 128p Altix 3700 (RTF) Networking - 10GigE Switch 32-port -10GigE Cards (1 Per 512p) - InfiniBand Switch (288port) - InfiniBand Cards (6 per 512p) - Altix 3900 2048 Numalink Kits InfiniBand T512p Fibre Fibre Fibre Fibre Fibre ChannelFibre ChannelFibre ChannelFibre Channel Channel 20TBChannel 20TBChannel 20TBChannel 20TB 20TB 20TB 20TB 20TB T512p SATA SATA SATA SATA 35TB SATA35TB SATA35TB SATA35TB SATA 35TB 35TB 35TB 35TB T512p T512p T512p T512p T512p T512p A512p A512p A512p A512p A512p A512p A512p A512p A512p A512p A512p A512p Switch 128p FCFC Switch 128p Compute Node - Altix 3700 12x512p - “Altix 3900” 8x512p Storage Area Network -Brocade Switch 2x128port Storage (440 TB) -FC RAID 8x20 TB (8 Racks) -SATARAID 8x35TB (8 Racks) 10 NREN Goals • Provide a wide area, high-speed network for large data distribution and real-time interactive applications • Provide access to NASA research and engineering communities primary focus: supporting distributed data access to/from Columbia • Provide access to federal and academic entities via peering with High Performance Research and Engineering Networks (HPRENs) • Perform early prototyping and proofs-of-concept of new technologies that are not ready for production network (NASA Integrated Services Network - NISN) 11 NREN-NG • NREN Next Generation (NG) wide-area network will be expanded from OC-12 to 10 GigE within the next 3-4 months to support Columbia applications. • NREN will “ride” the National Lambda Rail (NLR) to reach the NASA research centers and major exchange locations. 12 Approach Implementation Plan, Phase 1 NREN-NG Target ARC/NGIX-West NLR Sunnyvale StarLight NLR Chicago GRC NLR Cleveland NGIX-East GSFC MATP LRC JPL NLR Los Angeles NREN Sites Peering Points 10 GigE MSFC NLR MSFC JSC NLR Houston 13 NREN-NG Progress • Equipment order has been finalized. • Start construction of network from West to East • Temporary 1 GigE connection up to JPL in place, moving to 10 GigE by end of summer. • Current NREN paths to/from Columbia seeing gigabit/s transfers • NREN-NG will ride the National Lambda Rail network in the US 14 The NLR • National Lambda Rail (NLR). • NLR is a U.S. consortium of education institutions and research entities that partnered to build a nation-wide fiber network for research activities. – NLR offers wavelengths to members and/or Ethernet transport services. – NLR is buying a 20-year right-to-use of the fiber. 15 NLR – Optical Infrastructure - Phase 1 Seattle Portland Boise Chicago Denver Ogden /Salt Lake Clev Pitts KC Wash DC Raleigh LA San Diego Atlanta NLR Route NLR Layer 1 Jacksonville 16 17 Some Current NLR Members • • • • • • • • • • CENIC Pacific Northwest GigaPOP Pittsburgh Supercomp. Center Duke (coalition of NC universities) Mid-Atlantic Terascale Partnership Cisco Systems Internet2 Florida LambdaRail Georgia Institute of Technology Committee on Institutional Cooperation (CIC) • • • • • • Texas / LEARN Cornell Louisiana Board of Regents University of New Mexico Oklahoma State Regents UCAR/FRGP Plus Agreements with: • SURA (AT&T fiber donation) • Oak Ridge National Lab (ORNL) 18 NLR Applications • Pure optical wavelength research • Transport of Research and Education Traffic (like Internet2/Abilene today) • Private Transport of member traffic • Experience working operating and managing an optical network – Development of new technologies to integrate optical networks into existing legacy networks 19 Columbia Applications Distribution of Large Data Sets – Finite Volume General Circulation Model (fvGCM): Global atmospheric model – Requirements: (Goddard – Ames) • ~23 million points • 0.25 degree global grid • 1 Terabyte set for 5 day forecast Bandwidth [Gigabits/sec] (LAN/WAN) Data Transfer Time (hours) Current GSFC - Ames Performance 1.00 / 0.155 17 - 22 GSFC - Ames Performance (1/10 Gig) 1.00 / 10.00 3-5 GSFC - Ames Performance (Full 10 Gig) 10.00 / 10.00 0.4 - 1.1 – No data compression required, prior to data transfer – Assumes BBFTP for file transfers, instead of FTP or SCP 20 Columbia Applications • Distribution of Large Data Sets ECCO: Estimating the Circulation and Climate of the Ocean. Joint activity among Scripps, JPL, MIT & others • Run Requirements are increasing as model scope and resolution are expanded: – – – November ’03 = 340 GBytes / day February ’04 = 2000 GBytes / day February ‘05 = 4000 Gbytes / day (est) Bandwidth [Gigabits/sec] (LAN/WAN) Data Transfer Time (Hours) Previous NREN Performance 1.0/0.155 6 - 12 NREN Feb 2005 (CENIC 1G) 1.0/1.0 0.6 - 0.9 Projected NREN (CENIC 10G) 10.0/10.0 0.2 - 0.4 – Bandwidth for distributed data intensive applications can be limiter – Need high bandwidth alternatives and better file transfer options 21 22 23 hyperwall-1: large images 24 Columbia Applications Disaster Recovery/Backup – Transfer up to seven 200 gigabyte files per day between Ames and JPL – Limiting Factors • Bandwidth: recent upgrade from OC-3 POS to 1 Gigabit Ethernet • Compression: 4:1 Compression utilized for WAN transfers at lower bandwidths. Compression limited bandwidth to 29 Mbps (end host constraint) Data Compression Required Bandwidth [Gigabits/sec] (LAN/WAN) Data Transfer Time (hours) Yes (4:1) 1.00 / 0.155 27 - 31 JPL - Ames (CENIC 1 GigE) No 1.00 / 1.00 4.4 - 6.2 JPL - Ames (CENIC 10 GigE) No 10.00 / 10.00 0.6 - 1.5 Projected Transfer Improvement JPL - Ames (OC-3 POS) ARC JPL 25 Thanks. David Hartzell dhartzell@arc.nasa.gov 26