Science in the Clouds and Beyond Lavanya Ramakrishnan Computational Research Division (CRD) & National Energy Research Scientific Computing Center (NERSC) Lawrence Berkeley National Lab The goal of Magellan was to determine the appropriate role for cloud computing for science User interfaces/Science Gateways: Use of clouds to host science gateways and/or access to cloud resources through science gateways Hadoop File System MapReduce Programming Model/Hadoop Cost associativity? (i.e., I can get 10 cpus for 1 hr now or 2 cpus for 5 hrs at the same cost) Easier to acquire/operate than a local cluster Exclusive access to the computing resources/ability to schedule independently of other groups/users Ability to control groups/users Ability to share setup of software or experiments with collaborators Ability to control software environments specific to my application Access to on-demand (commercial) paid resources closer to deadlines Access to additional resources 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Program Office Program Office Advanced Scientific Computing Research 17% High Energy Physics 20% Biological and Environmental Research 9% Nuclear Physics 13% Basic Energy Sciences -Chemical Sciences 10% Advanced Networking Initiative (ANI) Project 3% Fusion Energy Sciences 10% Other 14% Magellan was architected for flexibility and to support research Compute Servers 720Compute Servers Nehalem Dual quad-core 2.66GHz 24GB RAM, 500GB Disk Totals 5760 Cores, 40TF Peak 21TB Memory, 400 TB Disk IO Nodes (9) Mgt Nodes (2) Big Memory Servers Gateway Nodes (27) Archival Storage 3 ESNet 10Gb/s Router 2 Servers 2TB Memory Aggregation Switch 10 Compute/Storage Nodes 8TB High-Performance FLASH 20 GB/s Bandwidth QDR InfiniBand Flash Storage Servers Global Storage (GPFS) 1 PB ANI 100 Gb/s Future Science + Clouds = ? Business model for Science Performance and Cost Data Intensive Science Technologies from Cloud 4 Scientific applications with minimal communication are best suited for clouds Runtime Relative to Magellan (nonVM) 18 Better 16 Carver 14 12 Franklin 10 Lawrencium 8 EC2-Beta-Opt Amazon CC 6 4 Amazon EC2 2 0 GAMESS GTC IMPACT fvCAM MAESTRO256 Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud, CloudCom 2010 5 Scientific applications with minimal communication are best suited for clouds Better Runtime relative to Magellan 60 50 40 Carver Franklin 30 Lawrencium EC2-Beta-Opt 20 Amazon EC2 10 0 MILC PARATEC 6 The principle decrease in bandwidth occurs when switching to TCP over IB. IB 10G - TCPoEth 4 TCPoIB Amazon CC PingPong Bandwidth(GB/s) 3.5 3 BEST 2X 2.5 2 5X 1.5 1 0.5 0 32 64 128 256 Number of Cores 512 1024 HPCC PingPong BW Evaluating Interconnect and Virtualization Performance for High Performance Computing, ACM Perf Review 2012 7 Better Ethernet connections are unable to cope with significant amounts of network connection 0.45 IB TCPoIB 10G - TCPoEth Amazon CC 10G- TCPoEth VM 1G-TCPoEth Random Order Bandwidth(GB/s) 0.4 0.35 0.3 BEST 0.25 0.2 Better 0.15 0.1 0.05 0 32 64 128 256 Number of Cores 8 512 1024 The principle increase in latency occurs for TCP over IB even at mid-range concurrency 250 IB TCPoIB 10G - TCPoEth Amazon CC 10G- TCPoEth VM 1G-TCPoEth PingPong Latency (us) 200 150 Better 100 40X 50 BEST 0 32 64 128 256 Number of Cores HPCC: PingPong Latency 9 512 1024 Latency is affected by contention by a greater amount than the bandwidth 500 IB TCPoIB 10G - TCPoEth Amazon CC 10G- TCPoEth VM 1G-TCPoEth Random Order Latency (us) 450 400 10G VM shows 6X 350 300 250 Better 200 150 100 BEST 50 0 32 64 128 256 Number of Cores HPCC: RandomRing Latency 10 512 1024 Clouds require significant programming and system administration support • STAR performed Real-time analysis of data coming from Brookhaven Nat. Lab • First time data was analyzed in realtime to a high degree • Leveraged existing OS image from NERSC system • Started out with 20 VMs at NERSC and expanded to ANL. 11 On-demand access for scientific applications might be difficult if not impossible Number of cores required to run a job immediately upon submission to Franklin 12 Public clouds can be more expensive than inhouse large systems Component Cost Compute Systems (1.38B hours) HPSS (17 PB) $180,900,000 $12,200,000 File Systems (2 PB) $2,500,000 Total (Annual Cost) $195,600,000 Assumes 85% utilization and zero growth in HPSS and File System data. Doesn’t include the 2x-10x performance impact that has been measured. This still only captures about 65% of NERSC’s $55M annual budget. No consulting staff, no administration, no support. 13 Cloud is a business model and can be applied to HPC centers Traditional Enterprise IT HPC Centers Typical Load Average 30% * 90% Computational Needs Bounded computing requirements – Sufficient to meet customer demand or transaction rates. Virtually unbounded requirements – Scientist always have larger, more complicated problems to simulate or analyze. Scaling Approach Scale-in. Emphasis on consolidating in a node using virtualization Scale-Out Applications run in parallel across multiple nodes. Cloud HPC Centers NIST Definition Resource Pooling, Broad network access, measured service, rapid elasticity, on-demand self service Resource Pooling, Broad network access, measured service. Limited: rapid elasticity, ondemand self service Workloads High throughput modest data workloads High Synchronous large concurrencies parallel codes with significant I/O and communication Software Stack Flexible user managed custom software stacks Access to parallel file systems and low-latency high bandwidth interconnect. Preinstalled, pretuned application software stacks for performance 14 Science + Clouds = ? Business model for Science Performance and Cost Data Intensive Science Technologies from Cloud 15 MapReduce shows promise but current implementations have gaps for scientific applications High throughput workflows Scaling up from desktops File system: non POSIX Language: Java Input and output formats: mostly line-oriented text Streaming mode: restrictive i/p and o/p model Data locality: what happens when multiple inputs? 16 Streaming adds a performance overhead Evaluating Hadoop for Science, In submission 17 Better High performance file systems can be used with MapReduce at lower concurrency Teragen (1TB) HDFS 12 GPFS Linear (HDFS) Expon. (HDFS) Linear (GPFS) Expon. (GPFS) 10 Time (minutes) 8 Better 6 4 2 0 0 500 1000 1500 Number of maps 18 2000 2500 3000 Data operations impacts the performance differences Better 19 Schemaless databases show promise for scientific applications Brain Schemaless database manager.x www.materialsproject.org Source: Michael Kocher, Daniel Gunter 20 manager.x manager.x Data centric infrastructure will need to evolve to handle large scientific data volumes Joint Genome Institute, Advance Light Source, etc are all facing a data tsunami 21 Cloud is a business model and can be applied at DOE supercomputing centers • Current day cloud computing solutions have gaps for science – performance, reliability, stability – programming models are difficult for legacy apps – security mechanisms and policies • HPC centers can adopt some of the technologies and mechanisms – support for data-intensive workloads – allow custom software environments – provide different levels of service 22 Acknowledgements • • US Department of Energy DE-AC02-05CH11232 Magellan – Shane Canon, Tina Declerck, Iwona Sakrejda, Scott Campbell, Brent Draney • Amazon Benchmarking – Krishna Muriki, Nick Wright, John Shalf, Keith Jackson, Harvey Wasserman, Shreyas Cholia • Magellan/ANL – Susan Coghlan, Piotr T Zbiegiel, Narayan Desai, Rick Bradshaw, Anping Liu • NERSC – Jeff Broughton, Kathy Yelick • Applications – Jared Wilkening, Gabe West, Ed Holohan, Doug Olson, Jan Balewski, STAR collaboration, K. John Wu, Alex Sim, Prabhat, Suren Byna, Victor Markowitz 23