Maui High Performance Computing Center Open System Support An AFRL, MHPCC and UH Collaboration December 18, 2007 Mike McCraney MHPCC Operations Director 1 Agenda MHPCC Background and History Open System Description Scheduled and Unscheduled Maintenance Application Process Additional Information Required Summary and Q/A 2 An AFRL Center An Air Force Research Laboratory Center Operational since 1993 Managed by the University of Hawaii • Subcontractor Partners – SAIC / Boeing A DoD High Performance Computing Modernization Program (HPCMP) Distributed Center Task Order Contract – Maximum Estimated Ordering Value = $181,000,000 • Performance Dependent – 10 Years • 4 Year Base Period with 2, 3-Year Term Awards 3 A DoD HPCMP Distributed Center Director, Defense Research and Engineering DUSD (Science and Technology) High Performance Computing Modernization Program Major Shared Resource Centers Distributed Centers Allocated Distributed Centers Aeronautical Systems Center (ASC) Army Research Laboratory (ARL) Engineer Research and Development Center (ERDC) Naval Oceanographic Office (NAVO) • • • • Army High Performance Computing Research Center (AHPCRC) Arctic Region Supercomputing Center (ARSC) Maui High Performance Computing Center (MHPCC) Space and Missile Defense Command (SMDC) Dedicated Distributed Centers • • • • • • • ATC AFWA AEDC AFRL/IF Eglin FNMOC JFCOM/J9 • • • • • • • NAWC-AD NAWC-CD NUWC RTTC SIMAF SSCSD WSMR 4 MHPCC HPC History 1994 - IBM P2SC Typhoon Installed MHPCC HPC Growth 1996 - 2000 IBM P2SC 2000 - IBM P3 Tempest Installed 12000 300 10000 250 2002 - IBM P2SC Typhoon Retired 8000 200 2002 - IBM P4 Tempest Installed 6000 150 2004 - LNXi Evolocity II Koa Installed 4000 100 2000 50 Disk (TB)/TFlops 350 2001 - IBM Netfinity Huinalu Installed Memory/Processors 14000 2005 - Cray XD1 Hoku Installed 2006 - IBM P3 Tempest Retired Processors Memory Disk 20 07 20 06 20 05 20 04 20 03 20 02 20 01 20 00 2007 - Dell Poweredge Jaws Installed 19 99 19 98 2007 - IBM P4 Tempest Reassigned 0 19 97 19 96 0 Tflops 5 Hurricane Configuration Summary Current Hurricane Configuration: Eight, 32 processor/32GB “nodes” IBM P690 Power4 Jobs may be scheduled across nodes for a total of 288p Shared memory jobs can span up to 32p and 32GB 10TB Shared Disk available to all nodes LoadLeveler Scheduling One job per node – 32p chunks – can only support 8 simultaneous jobs Issues: Old technology, reaching end of life, upgradability issues Cost prohibitive – Power consumption constant ~$400,000 annual power cost 6 Dell Configuration Summary Proposed Shark Configuration: 40, 4 processor/8GB “nodes” Intel 3.0Ghz Dual Core Woodcrest Processors Jobs may be scheduled across nodes for a total of 160p Shared memory jobs can span up to 8p and 16GB 10TB Shared Disk available to all nodes LSF Scheduler One job per node – 8p chunks – can support up to 40 simultaneous jobs Features/Issues: Shared use as Open system and TDS (test and development system) Much lower power cost – Intel power management System already maintained and in use System covered 24x7 UPS, generator Possible short-notice downtime 7 Jaws Architecture Cisco 6500 Core Head Node for System Administration Head • “Build” Nodes Node • Running Parallel Tools – (pdsh, pdcp, etc.) SSH Communications Between Nodes • Localized Infiniband Network • Private Ethernet User User Webtop User Dell Remote Access Controllers 24 Lustre Webtop Webtop I/O Nodes, • Private Ethernet 1 MDS • Remote Power On/Off Fibre • Temperature Reporting Channel • Operability Status • Alarms Storage Storage • 10 Blades Per Chassis Storage Storage DDN CFS Lustre Filesystem 200 TB • Shared Access • High Performance • Using Infiniband Fabric Gig-E nodes with 10 Gig-E uplinks. 40 nodes per uplink. Simulation Simulation Engine Simulation 1280 Engine Simulation Engine Batch Engine (5120 Cores) Cisco Infiniband (Copper) 10 Gig-E Ethernet Fibre 3 User User Webtop Interactive Webtop Nodes (12 cores) Network Network s Network s Network DREN s s Network s 8 Shark Software Systems Software • Red Hat Enterprise Linux v4 – • 2.6.9 Kernel Infiniband Cisco Software stack • MVAPICH – • • • • MPICH 1.2.7 over IB Library Gnu 3.4.6 C/C++/Fortran Intel 9.1 C/C++/Fortran Platform LSF HPC 6.2 Platform Rocks 9 Maintenance Schedule Current • 2:00pm – 4:00pm • 2nd and 4th Thursday (as necessary) • Check website (mhpcc.hpc.mil) for maintenance notices New Proposed Schedule • 8:00am – 5:00pm • 2nd and 4th Wednesdays (as necessary) • Check website for maintenance notices Only take maintenance on scheduled systems Check on Mondays before submitting jobs 10 Account Applications and Documentation Contact Helpdesk or website for application information Documentation Needed: • Account names, systems, special requirements • Project title, nature of work, accessibility of code • Nationality of applicant • Collaborative relevance with AFRL New Requirements • “Case File” information • For use in AFRL research collaboration • Future AFRL applicability • Intellectual property shared with AFRL Annual Account Renewals • September 30 is final day of the fiscal year 11 Summary Anticipated migration to Shark Should be more productive and able to support wide range of jobs Cutting edge technology Cost savings from Hurricane (~$400,000 annual) Stay tuned for timeline – likely end of January, early February 12 Mahalo 13