IBM Platform Computing Reap the Benefits of the Evolving HPC Cloud By John Russell, Contributing Editor, Bio•IT World Produced by Cambridge Healthtech Media Group IBM Platform Computing Reap the Benefits of the Evolving HPC Cloud Harnessing the necessary high performance compute power to drive modern biomedical research is a formidable and familiar challenge throughout the life sciences. Modern research-enabling technologies – Next Generation Sequencing (NGS), for example – generate huge datasets that must be processed. Key applications such as genome assembly, genome annotation and molecular modeling can be data-intensive, compute intensive, or both. Underlying high performance computing (HPC) infrastructures must evolve rapidly to keep pace with innovation. And not least, cost pressures constrain both large and small organizations alike. In such a demanding and dynamic HPC environment, Cloud Computing(i) technologies, whether deployed as a private cloud or in conjunction with a public cloud, represent a powerful approach to managing technical computing resources. By breaking down internal compute silos, by masking underlying HPC complexity to the scientist-clinician researcher user community, and by providing transparency and control to IT managers, cloud computing strategies and tools help organizations of all sizes effectively manage their HPC assets and growing compute workloads that consume them. The IBM® Platform Computing™(ii) portfolio has been driving the evolution of distributed computing and the HPC Cloud for over 20 years. Ground-breaking products such as IBM® Platform™ LSF® were among the first to enable companies to manage distributed environments from modest clusters to massive compute farms with tens of thousands of processers handling thousands of jobs. Most recently the introduction of IBM® Platform™ Dynamic Cluster, together with IBM® Platform™ Cluster Manager - Advanced Edition, permits turning LSF environments into a dynamic HPC cloud. At the heart of all shared technical computing is robust middleware that is positioned between the collection of applications and the diverse IT resources, to handle workload scheduling and resource orchestration. IBM Platform Computing products fulfill this critical role, providing powerful solutions for batch-mode computing, service-oriented architectures (SOA), and innovative MapReduce approaches now being widely adopted in life science. Reap the Benefits of the Evolving HPC Cloud Sequencing Centers are Early HPC Cloud Adopters Perhaps not surprisingly, large sequencing centers have been early adopters of high performance cloud computing. Their HPC demands are immense. Worldwide, annual sequencing capacity was estimated at 13 quadrillion bases at the end of 2011(iii). It’s also worth noting a single base pair typically represents about 100 bytes of data (raw, analyzed, and interpreted). Here’s one example of a leading sequencing center employing HPC cloud concepts to cope with its data avalanche: Wellcome Trust Sanger Institute, a pioneer in the international consortium to sequence the human genome(iv), has relied on IBM Platform LSF for more than a decade. In 2010, Sanger Institute generated more than 9 Petabytes of genomics data(v), and today produces roughly 120TB of raw data per week that has to be processed for analysis. IBM Platform LSF helps the Institute run up to half a million sequence matching jobs a day, resulting in improved data processing capability, researchers’ efficiency and time-to-results. Indeed, HPC cloud computing can be particularly useful in the life sciences industry where a long legacy of departmental independence often results in islands of costly but under-used and under-supported HPC resources. Moreover, the intense merger and acquisition activity of the past two decades has exacerbated the industry’s challenge of managing disparate systems. Integrating these IT islands into a well-managed shared environment not only improves efficiency and widens user access but also allows effective rightsizing – up or down – of the company’s total HPC infrastructure. And flexibility is improved by the ability to match individual jobs with their best computational resource fit. This is increasingly important as heterogeneous computing (GPGPU-based, FPGA-based, MIC(vi) etc.) is increasingly used to accelerate specific life science applications. More broadly, HPC Clouds provide a data center transformation that rationalizes use of IT resources, shortens R&D’s time-to-results, and ultimately delivers benefits to the business bottom line. IBM Platform Computing technologies can drive this transformation. 2 Accelerating Batch, SOA, and MapReduce Jobs There is not a one-size-fits-all HPC infrastructure. Company size, HPC workload (e.g. batch or SOA), and user community requirements all influence the nature of underlying IT architectures. IBM Platform Computing offers a comprehensive range of systems management solutions for distributed HPC environments. Here’s a brief snapshot of the core offerings: IBM Platform LSF. Ideal for accelerating batch-oriented computing, Platform LSF is a powerful workload management platform that scales from small clusters to massive compute farms and grids. It provides comprehensive, policy-driven scheduling features that enable utilization of all your compute infrastructure resources to ensure optimal application performance. IBM® Platform™ HPC. Intended to speed and simplify cluster deployment and use, Platform HPC is a complete solution in a single product. It includes a range of out-of the-box features designed to help reduce the complexity of your HPC environment and improve your time-to-solution. Platform HPC is only available bundled with hardware. IBM® Platform™ Symphony. When low-latency SOA or MapReduce functionality is required, Platform Symphony is the choice. It delivers powerful enterprise-class management for running both compute and data intensive distributed applications on a scalable, shared grid. It accelerates dozens of parallel applications, for faster results and better utilization of all available resources. All IBM Platform Computing solutions offer highly flexible policy-based scheduling models to ensure the right job prioritization and resource allocations are executed on a continuously updated basis. Charge-back and guarantees also help ensure that groups get their share of resources to meet business requirements, making it faster and simpler to deploy heterogeneous applications on the same shared cluster, grid or cloud. Diverse resources are shared fluidly, bringing utilization closer to 100 percent, which can translate to reduced time to results, higher service levels and less labor required to manage IT – and reduced infrastructure costs for your organization. Turning on the HPC Cloud The newest member of the IBM Platform Computing lineup – IBM Platform Dynamic Cluster – is available as an add-on to IBM Platform LSF. It turns static Platform LSF clusters into dynamic, shared cloud infrastructure. By automatically changing the composition of clusters to meet ever-changing workload demands, service levels are improved and organizations can do more work with less infrastructure. Unlike many other solutions, Platform Dynamic Cluster provides the flexibility to automatically provision mixed physical and virtual environments and leverages existing investments in hypervisors, management tools, and virtual machine templates to create a dynamic private HPC cloud environment. Reap the Benefits of the Evolving HPC Cloud With intelligent policies and features such as live job migration and automated checkpoint-restart, Platform Dynamic Cluster helps you increase utilization, while reducing administrator workload. Another key member of the IBM Platform LSF family is IBM Platform Application Center, a powerful and comprehensive web-based portal for HPC job submission and management. This is a critical capability for life sciences, particularly in genomics where data analysis pipelines are often complex and involve several HPC applications being run by the researcher community. Job submission including customizable application submission forms and built-in templates; job monitoring; web services APIs; and integration with industry-leading ISVs are all part of IBM Platform Application Center. By reducing training and support needs, by improving resource security, and by minimizing user errors, IBM Platform Application Center boosts HPC cloud productivity and increases user-community satisfaction. Standardizing access to applications also makes policy enforcement easier. It bears repeating that cluster management – in addition to workflow and application management – is a fundamental enabler of HPC cloud and shared computing. Two offerings, IBM Platform Cluster Manager – Standard Edition and IBM Platform Cluster Manager - Advanced Edition, are able to deliver rapid, robust deployment of HPC clusters, which can then be turned into true HPC clouds. IBM Platform Cluster Manager – Standard Edition is well suited for clients with a single HPC cluster and more static application requirements. Platform Cluster Manager – Standard Edition delivers the capability to quickly provision, run, manage and monitor HPC clusters with unprecedented ease. IBM Platform Cluster Manager – Advanced Edition automates assembly of multiple high-performance technical computing clusters on a shared compute infrastructure for use by multiple teams. IBM Platform Cluster Manager – Advanced Edition includes support for multi-tenant HPC cloud and multiple workload managers.(vii) It creates an agile environment for running both technical computing and big data analytics workloads to consolidate disparate cluster infrastructure, resulting in increased hardware utilization and the ability to meet or exceed service level agreements while lowering costs. IBM Platform Cluster Manager – Advanced Edition is designed to create a dynamic and flexible multi-tenant HPC environment where individual clusters can be deployed on demand. By consolidating silos of cluster infrastructure, multiple groups can share a common HPC resource pool. Through rapid physical and virtual machine provisioning entire clusters can be deployed and sized to meet the ever-changing demands of HPC workloads. Administrative overhead is greatly reduced through self-service and a comprehensive administrative console. 3 Choosing the Best IBM Platform Computing Option Despite the range of IBM Platform Computing offerings, choosing the most appropriate option for your organization is generally straightforward. Platform LSF, for example, is optimum for batch workloads, whether it’s many single threaded jobs run for a short time or a single parallelized application run on thousands of processors for hours. Platform LSF optimizes the running of the jobs on the HPC resources and ensures the compute facility runs as close to 100 percent as is practical. LSF is available in three editions – Express, Standard, and Advanced – with best fit based upon the scale of requirements. For many life science organizations with multiple user constituencies who are running time-consuming, compute-intensive informatics (e.g. QSAR analysis(viii)) vying for valuable HPC resources, batch processing is often the best approach to maximizing IT resource. For those building modest size clusters and also buying the hardware, IBM Platform HPC is an excellent choice to speed and simplify the process. Based on Platform LSF, Platform HPC also has a number of other capabilities built in (provisioning, cluster management, IBM Platform MPI, and reporting, for example). It features a web-based dashboard for systems monitoring and trouble-shooting, and simplified job submission and management. Reap the Benefits of the Evolving HPC Cloud In contrast to Platform LSF, Platform Symphony is designed to handle huge volumes of extremely low latency tasks for both compute and data intensive workloads. Historically, the challenge was if you had a job that took a tenth of a second to run but your scheduler pushed jobs out once every ten seconds, then your processor was idle for 99% of the time on these jobs. Platform Symphony can schedule jobs on the order of milliseconds and the latency between tasks is on the order of four milliseconds. Platform Symphony also accommodates the life sciences’ recent embrace of MapReduce computing (proximity computing in which Big Data is divided into smaller chunks that are colocated on the node on which they are run(ix)). Data-intensive informatics applications such as genome assembly, variant calling and other annotation can significantly be accelerated using MapReduce approaches. A recent performance benchmark done by STAC research showed that Platform Symphony scheduling is 63 times faster than the open source Hadoop implementation. The overall result can be dramatic productivity gains. As a practical reality, most life science organizations cannot afford (cost, space, staff) to acquire and maintain all the HPC infrastructure required to meet every need. The IBM Platform Computing portfolio readily supports “Cloud Bursting” – reaching beyond your internal cloud to a public cloud as your capacity requirements fluctuate or as specialized hardware needs arise. 4 Conclusion Technical computing has undergone a steady evolution encompassing servers, clusters, complicated grids, and now clouds. In the world of scientific computing, the Cloud is the next evolutionary step and it promises to help companies achieve major operational and strategic objectives. Workload optimization across resources, sophisticated business policies, greater access to IT resources, and low latency processing all combine to shorten the time to results – a pressing need in life sciences. The result is faster, better science and enhanced productivity in drug R&D and healthcare delivery. Just as important, the HPC Cloud and shared computing bring stronger control over IT infrastructure cost growth by consolidating resources, deploying improved monitoring and reporting, and enabling ‘bursting’ to external resources as required. Overall, the HPC Cloud improves competitiveness and drives business benefit. The IBM Platform Computing portfolio of HPC cloud-enabling products provides innovative tools to reach these goals. For more information about IBM Platform Computing cloud solutions, visit the IBM Platform Computing website at ibm. com/platformcomputing Reap the Benefits of the Evolving HPC Cloud Endnotes i NIST cloud computing definition, http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf ii IBM acquired Platform Computing in Jan 2012, http://www-03.ibm.com/press/us/ en/pressrelease/36372.wss iii DNA Sequencing Caught in Deluge of Data”, New York Times, Nov. 30, 2011, http://www.nytimes.com/2011/12/01/business/dna-sequencing-caught-in-delugeof-data.html?_r=1&ref=science iv International Human Genome Sequencing Consortium, http://www.genome. gov/11006939 v WTSI, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3228552/ vi Intel Many Integrated Core Architecture, http://www.intel.com/content/www/us/ en/architecture-and-technology/many-integrated-core/intel-many-integratedcore-architecture.html vii Current workload managers supported include Grid Engine and TIBCO GridServer (formerly DataSynapse GridServer) viii Quantitative Structure Relationship Activity Analysis ix For a detailed perspective see Ronald Taylor’s, An overview of the Hadoop/ MapReduce/HBase framework and its current applications in bioinformatics.) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040523/ 5