Cloud Compu2ng for Science

advertisement
Cloud Compu)ng for Science Keith R. Jackson krjackson@lbl.gov Computa)onal Research Division Why Clouds for Science? •  On‐demand access to compu)ng and cost associa)vity •  Parallel programming models for data intensive science –  e.g., BLAST parametric runs •  Customized and controlled environments –  e.g., Supernova Factory codes have sensi)vity to OS/
compiler versions •  Overflow capacity to supplement exis)ng systems –  e.g., Berkeley Water Center has analysis that far exceeds capacity of desktops 2 Cloud Early Evalua)ons •  How do DOE/LBL workloads perform in Cloud environments? –  What is the impact on performance from virtual environments? –  What do Cloud programming models like Hadoop offer to scien)fic experimenta)on? •  What does it take for scien)fic applica)ons to run in Cloud environments such as Amazon EC2? •  Do Clouds provide an alterna)ve for data‐
intensive interac)ve science? 3 Scien)fic Workloads at LBL •  High performance compu)ng codes –  supported by NERSC and other supercompu)ng centers •  Mid‐range compu)ng workloads –  that are serviced by LBL/IT Services, other local cluster environments •  Interac)ve data intensive processing –  usually run on scien)st’s desktops 4 NERSC‐6 Benchmarking •  Subset of NERSC‐6 applica)on benchmarks for EC2 with smaller input sizes –  represent the requirements of the NERSC workload –  rigorous process for selec)on of codes –  workload and algorithm/science‐area coverage •  Run on EC2 high‐CPU XL (64‐bit) nodes –  Intel C/Fortran compilers –  OpenMPI patched for cross‐subnet comm. –  $0.80/hour 5 Experiments on Amazon EC2 Codes Science Area
Algorithm Space
Configura6on Slow‐ Reduc6on down factor (SSP) Rela)ve to Franklin CAM Climate Navier Stokes CFD
(BER)
MILC Lafce Gauge Physics (NP)
IMPAC
T‐T MAEST
RO 200 processors Standard IPCC5 D‐Mesh resolu)on 3.05 0.33 Could not complete 240 proc run due to transient node failures. Some I/O and small messages Conjugate gradient, sparse matrix; FFT
Weak scaled: 2.83 4
14 lafce on 8, 32, 64, 128, and 256processors 0.35 Erra)c execu)on )me Acceler
ator Physics (HEP)
PIC, FFT component
64 processors, 64x128x128 grid and 4M par)cles 4.55 0.22 PIC por)on performs well, but 3D FFT poor due to small message size Astrop
hysics (HEP)
Low Mach Hydro; block structured‐
grid mul)physics
128 processors for 128^3 computa)onal mesh 5.75 0.17 Small messages and all‐
reduce for implicit solve. 6 Mid‐range codes on Amazon EC2 •  Lawrencium Cluster – 64 bit/Dual sockets per node/8 cores per node/
16GB memory, Infiniband interconnect •  EC2 – 64 bit/2 cores per node/75GB,15GB and 7GB memory Code
Slow down factor
FMMSpeed. Fast Multipole Method. Pthread parallel code with ½ GB IO
1.3 to 2.1
GASBOR. A Genetic algortihm ab initio reconstruction algorithm. Serial
workload, minimal I/O (KB)
1.12 to 3.67
ABINT. DFT code that calculates the energy, charge density and electronic
structure for molecules and periodic solids. Parallel MPI, minimal I/O.
1.11 to 2.43
Hpcc. HPC Challenge Benchmark
2.8 to 8.8
VASP. Simulates property of systems at the atomic scale. MPI parallel
application
14.2 to 22.4
IMB. Intel (formerly Pallas) Memory Benchmark . Alltoall among all MPI
threads
12.7 to 15.79
7 Performance Observa)ons •  Setup to look like conven)onal GigE cluster –  achieves 0.26 * SSP (Sustained System Performance) of franklin per CPU –  but must evaluate throughput/$ •  Performance Characteris)cs –  Good TCP performance for large messages –  Nonuniform execu)on )mes (VMMs have lots of noise/jiper) •  Bare‐metal access to hardware – 
– 
– 
– 
High overhead for small messages No OS bypass (it’s a VMM), so no efficient one‐sided messaging Poor shared disk I/O (good local I/O) Need more robust (infiniband) interconnect 8 What codes work well? •  Minimal synchroniza)on, Modest I/O requirements •  Large messages or very liple communica)on •  Low core counts (non‐uniform execu)on and limited scaling) •  Generally applica,ons that would do well on midrange clusters –  mostly run in LBL/IT and local cluster resources today 9 Integrated Microbial Genomes (IMG) •  Goal: improving overall quality of microbial genome data –  suppor)ng the compara)ve analysis of metagenomes –  genomes in IMG together with all available GEBA genomes •  Large amount of sequencing of microbial genomes and meta‐genome samples using BLAST –  the computa)on scheduled within a certain )me range –  takes about 3 weeks on a modest‐sized Linux cluster –  projected to exceed current compu)ng resources •  What can we do to help such applica)ons? –  Does cloud compu)ng and tools such as Hadoop help manage the task‐farming? 10 Hardware Plasorms •  Franklin: Tradi6onal HPC System –  40k core, 360TFLOP Cray XT4 system at NERSC –  Lustre parallel filesystem •  Planck: Tradi6onal Midrange cluster –  32‐node Linux/x86/Infiniband Cluster at NERSC –  GPFS Global and Hadoop on Demand (HOD) •  Amazon EC2: Commercial “Infrastructure as a Service” Cloud –  Configure and boot customized virtual machines in Cloud –  Elas)c MapReduce/Hadoop images and S3 for parallel filesystem •  Yahoo M45: Shared Research “PlaOorm as a Service” Cloud –  400 nodes, 8 cores per node, Intel Xeon E5320, 6GB per compute node, 910.95TB –  Hadoop/MapReduce service: HDFS and shared file system 11 Sotware Plasorms •  NCBI BLAST (2.2.22) –  Reference IMG genomes‐ of 6.5 mil genes (~3Gb in size) –  Full input set 12.5 mil metagenome genes against reference •  BLAST Task Farming Implementa6on –  Server reads inputs and manages the tasks –  Client runs blast, copies database to local disk or ramdisk once on startup, pushes back results –  Advantages: fault‐resilient and allows incremental expansion as resources come available •  Hadoop/MapReduce implementa6on of BLAST –  Hadoop is open source implementa)on of MapReduce –  Sotware framework for processing huge datasets 12 Hadoop Processing Model •  Advantages of Hadoop – 
– 
– 
– 
– 
Broadly supported on cloud plasorms, Transparent Data Replica)on Data locality aware scheduling Fault tolerance capabili)es Dynamic resource management for growing •  Implementa6on details –  Use streaming to launch a script that calls executable –  HDFS for input, need shared file system for binary and database –  Each sequence needs to be in a single line to use standard input format reader –  Custom input format reader that can understand blast sequences 13 Performance Comparison •  Evaluated small‐scale problem (2500 sequences) on mul6ple plaOorms (Limited by access and costs) •  Similar per‐core performance across plaOorms Time (seconds) 3000 EC2 Hadoop
2500 Planck Hadoop On Demand (HOD)
2000 Planck Task Farming
Franklin Task Farming
1500 1000 500 0 32
64
Number of Cores 14 128
Supernova Factory •  Tools to measure expansion of universe and energy –  image matching algorithms –  data pipeline, task parallel workflow –  large data volume for supernova search •  Using Amazon EC2 –  Stable 32‐bit Linux compu)ng environment –  Data requirements •  about 0.5 TB exis)ng data •  about 1 TB of storage for 12 months •  and 1 TB of transfer from the cloud. 15 Berkeley Water Center •  Studying global scale environmental processes –  integra)on of local, regional, global spa)al scales. –  integra)on across disciplines, e.g., climatology, hydrology, forestry, etc., and methodologies •  Common Eco‐Science Data Infrastructure –  address quality, heterogeneity and scale –  interfaces and services for accessing and processing data 16 MODerate‐resolu)on Imaging Spectroradiometer (MODIS) •  Two MODIS satellites near polar orbits –  global coverage every one to two days •  Data Integra)on challenges –  ~ 35 science data products including atmospheric and land products –  products are in different projec)on, resolu)ons (spa)al and temporal), different )mes –  data volume and processing requirements exceed desktop capacity 17 Windows Azure Cloud Solu)on •  Lower resource entry barriers •  Hide the complexi)es in data collec)on, reprojec)on and management from domain scien)sts •  A generic Reduc6on Service for scien)sts to upload arbitrary executables to perform scien)fic analysis on reprojected data. •  90X Improvement over scien)st desktop Windows Azure Cloud Compu6ng PlaOorm MODIS Source Data Data Processing Pipeline 18 Scien)fic Results An Enabling Service for Scien)sts •  Gives scien)st the ability to do analysis that was not possible before •  Programming model –  future experimenta)on with Dryad/MapReduce frameworks •  Interac)ve cases need to refined –  intermediate data products –  upload executables to perform scien)fic analysis on data. 19 DOE Cloud Research Magellan Project •  DOE Advanced Scien)fic Compu)ng Research (ASCR) –  $32.8M project at NERSC and Argonne (ALCF) –  ~100 TF/s compute cloud testbed (across sites) –  Petabyte‐scale storage cloud testbed •  Mission –  Deploy a test bed cloud to serve the needs of mid‐
range scien)fic compu)ng. –  Evaluate the effec)veness of this system for a wide spectrum of DOE/SC applica)ons in comparison with other plasorm models. –  Determine the appropriate role for commercial and/
or private cloud compu)ng for DOE/SC midrange workloads 20 NERSC Magellan Cluster 720 nodes, 5760 cores in 9 Scalable Units (SUs)  61.9 Teraflops SU = IBM iDataplex rack with 640 Intel Nehalem cores SU SU 18 Login/network nodes SU Login 10G Ethernet Internet SU SU SU SU I/O Network Network Load Balancer SU SU Network 100‐G Router HPSS (15PB) ANI 21 8G FC I/O NERSC Global Filesystem 1 Petabyte with GPFS NERSC Magellan Research Ques)ons •  What are the unique needs and features of a science cloud? •  What applica)ons can efficiently run on a cloud? •  Are cloud compu)ng APIs such as Hadoop effec)ve for scien)fic applica)ons? •  Can scien)fic applica)ons use a DaaS or SaaS model? •  Is it prac)cal to deploy a cloud services across mul)ple DOE sites? •  What are the security implica)ons of user‐controlled cloud images? •  What is the cost and energy efficiency of clouds? 22 Summary •  Cloud environments impact performance –  ongoing work to improve these environments for scien)fic applica)ons •  Cloud tools require customiza)ons suitable for scien)fic data processing •  Rethinking service model –  support for interac)ve applica)ons and dynamic sotware environments 23 Acknowledgements •  NERSC Benchmarks – Harvey Wasserman, John Shalf •  IT Benchmarks ‐ Greg Bell, Keith Greg Kurtzer, Krishna Muriki, John White •  BLAST on Hadoop – Victor Markowitz, John Shalf, Shane Canon, Lavanya Ramakrishnan, Shreyas Cholia, Nick Wright •  Supernova Factory on EC2 – Rollin Thomas, Greg Aldering, Lavanya Ramakrishnan •  Berkeley Water Center – Deb Agarwal, Catharine van Ingen (MSR), Jie Li (UVa), Youngryel Ryu (UCB), Marty Humphrey (UVa), Windows Azure team 24 
Download