DOE Perspectives on CI - LBNL

DOE Perspective on Cyberinfrastructure - LBNL Gary Jung Manager, High Performance Computing Services Lawrence Berkeley National Laboratory Educause CCI Working Group Meeting November 5, 2009 Midrange Computing • DOE ASCR hosted a workshop in Oct 2008 to assess the role of mid-range computing in the Office of Science and revealed that this computation continues to play an increasingly important role in enabling the Office of Science. • Although it is not part of ASCR's mission, midrange computing, and the associated data management play a vital and growing role in advancing science in disciplines where capacity is as important as capability. • Demand for midrange computing services is… o o growing rapidly at many sites (>30% growth annually at LBNL) the direct expression of a broad scientific need • Midrange computing is a necessary adjunct to leadership-class facilities 2 November 5, 2009 Berkeley Lab Computing • Gap between desktop and National Centers • Midrange Computing Working Group 2001 • Cluster support program started in 2002 o Services for PI-owned clusters include: Pre purchase consulting; development of specs and RFP, facilities planning, installation and configuration, ongoing cluster support, user services consulting, cybersecurity, computer room colocation • Currently 32 clusters in production, over 1400 nodes, 6500 processor cores • Funding: Institution provides support for infrastructure costs, technical development. Researchers pay for cluster and incremental cost of support. 3 November 5, 2009 Cluster Support Phase II: Perceus Metacluster • All clusters interconnected into shared cluster infrastructure o o o o Permits sharing of resources, storage  Global home file system One ‘super master’ node, used to boot nodes across all clusters  multiple system images supported One master job scheduler, submitting to all clusters Simplifies provisioning new systems and ongoing support • Metacluster model made possible by Perceus software o successor to Warewulf (http://www.perceus.org) o can run jobs across clusters, recapturing stranded capacity. 4 November 5, 2009 5 November 5, 2009 Laboratory-Wide Cluster - Drivers “Computation lets us understand everything we do.” – LBNL Acting Lab Director Paul Alivisatos 38% of scientists depend on cluster computing for research. 69% of scientists are interested in cycles on a Lab-owned cluster. o early-career scientists twice as likely to be ‘very interested’ than later-career peers Why do scientists at LBNL need midrange computing resources? o ‘on ramp’ activities in preparation for running at supercomputing centers (development, debugging, benchmarking, optimization) o scientific inquiry not connected with ‘on ramp’ activities 6 November 5, 2009 Laboratory-Wide Cluster “Lawrencium” • Overhead funded program o o • • Production in Fall 2008 General purpose Linux cluster suitable for a wide range of applications o o o o • • • Capital equipment dollars shifted from business computing Overhead funded staffing - 2 FTE 198-nodes, 1584 cores, DDR Infiniband interconnect 40TB NFS home directory storage; 100TB Lustre parallel scratch Commercial job scheduler and banking system #500 on the Nov 2008 Top500 Open to all LBNL PIs and collaborators on their project Users are required to complete a survey when applying for accounts and later provide feedback on science results No user allocations at this time. This has been successful to date. 7 November 5, 2009 Networking - LBLNet • • • • Peer at 10GBE with ESNET 10GbE at core. Moving to 10GbE to the buildings Goal is sustained high speed data flows with cybersecurity Network based IDS approach - traffic is innocent until proven guilty o o o Reactive firewall Does not impede data flow. no stateful firewall. Bro cluster allows us to scale our IDS to 10GBE 8 November 5, 2009 Communications and Governance • General announcements at IT council • Steering committees used for scientific computing o o o Small group of stakeholders, technical experts, decision makers Helps to validate and communicate decisions Accountability 9 November 5, 2009 Challenges • Funding (past) o o Difficult for IT to shift funding from other areas of computing to support for science Recharge can constrain adoption. Full cost recovery definitely will. • New Technology (ongoing) • Facilities (current) o Computer room is approaching capacity despite upgrades     o Environmental Monitoring Plenum in ceiling converted to hot air return Tricks to boost underfloor pressure Water cooled doors Underway  DCIE measurement in process  Tower and heat exchanger replacement  Data Center container investigation 10 November 5, 2009 Next Steps • Opportunities presented by cloud computing o Amazon investigation earlier this year. Others ongoing     Latency sensitive applications ran poorly as expected Performance dependent of specific use case Data migration. Economics of storing vs moving Certain LBNL factors favor costs for build instead of buy • Large storage and computation for data analysis • GPU investigation 11 November 5, 2009 Points of Collaboration • UC Berkeley HPCC o o o Recent high profile joint projects between UCB and LBNL encourages close collaboration 25-30% of scientists have dual appointment UC Berkeley proximity to LBNL facilitates the use of cluster services • University of California Shared Research Computing Services pilot (SRCS) o o o o o LBNL and SDSC joint pilot for the ten UC campuses Two 272-node clusters located at UC Berkeley and SDSC Shared computing is more cost-effective Dedicated CENIC L3 connecting network for integration Pilot consists of 24 research projects 12 November 5, 2009 13 November 5, 2009

DOE Perspectives on CI - LBNL

Related documents

Products

Support

DOE Perspectives on CI - LBNL

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib