Intelligent Storage and Data Management with IBM General Parallel File System (GPFS) Maciej REMISZEWSKI IBM Forum 2012 – Estonia Tallinn, October 9, 2012 RedBull STRATOS redbullstratos.com/live/ Critical IT Trends for Technical Computing Users Explosion of data How to spot trends, predict outcomes and take meaningful actions? Inflexible IT infrastructures How to manage inflexible, siloed systems and business processes to improve business agility? Escalating IT complexity How to manage IT costs and complexity while speeding time-tomarket for new services? Introducing the new IBM Technical Computing Portfolio Powerful. Comprehensive. Intuitive. Solutions Integrated Solutions Industry Solutions Parallel Environment Runtime Software Systems & Storage Platform MPI Intelligent Cluster Platform Cluster Manager BG/Q iDataPlex Intelligent Cluster Big Data Platform HPC Parallel Environment Platform LSF Developer Platform Symphony Engineering and Scientific Libraries PureFlex HPC Cloud System x & BladeCenter LTO Tape 3592 Automation P7-775 GPFS Platform Application Center SoNAS DS5000 DS3000 DCS3700 NEW HPC Cloud Solutions Overview • Innovative solutions for dynamic, flexible HPC cloud environments What’s New • New LSF add-on: IBM Platform Dynamic Cluster V9.1 o Workload driven dynamic node re-provisioning o Dynamically switch nodes between physical & virtual machines o Automated job checkpoints and migration o Smart, flexible policy and performance controls • Enhanced Platform Cluster Manager – Advanced capabilities • New complete, end to end solutions Use Case 1: HPC Infrastructure Management • Self-service cluster provisioning & management • Consolidate resources into a HPC cloud • Cluster flexing Use Case 2: Self-service HPC • Self-service job submission & management • Dynamic provisioning • Job migration and/or checkpointrestart • 2D/3D remote visualization Use Case 3: Cloud Bursting • ‘Burst’ internally to available resources • Burst externally to cloud providers NEW Financial Risk and Crimes Solution Overview •High-performance, low-latency integrated risk solution stack with Platform Symphony - Advanced Edition and partner products including: o BigInsights and IBM Algorithmics o 3rd party Partner products: Murex and Calypso What’s New •New solution stacks to manage and process big data with speed and scale •Sales tools that highlight value of IBM Platform Symphony o Financial Risk: Customer testimonial videos; inclusion in SWG risk frameworks, and S&D blueprints o Financial Crime: with BigInsight for credit card fraud analytics o TCO tool and benchmarks Use Case 1: Financial Risk including Credit Value Adjustment (CVA) analytics • Accelerates compute intensive workloads up to 4X e.g. Monte Carlo simulations, Algorithmic Riskwatch “cube” simulations • Integrated with IBM Algorithmics, Murex and Calypso • High throughput: 17K tasks/sec Use Case 2: Big Data for Financial Crimes • Accelerates analyses of data for fraud and irregularities • Supports BigInsight • Faster than Apache Hadoop distribution 8 NEW Technical Computing for Big Data Solutions Overview • High-performance, low-latency “Big Data” solution stack featuring Platform Symphony, GPFS, DCS3700, Intelligent Cluster – proven across many industries •Low Latency Hadoop stack with Platform Symphony, Advanced Edition and InfoSphere BigInsights What’s New •New solution stacks to manage and process big data with speed and scale IBM General Parallel File System (GPFS) IBM DCS3700 / IBM Intelligent Cluster 9 IBM is delivering a Smarter Computing foundation for Technical Computing Designed for data Tuned to the task Increase throughput, utilization, and lower operating costs with workload optimized systems and intelligent resource management Smarter Computing Achieve faster time to insight with scalable, low latency data access and control for Big Data analytics Managed with Cloud Technologies Optimize agility with on-demand and workload-driven dynamic cluster, grid, and HPC clouds Technical Computing Software Simplified management, optimized performance The backbone of Technical Computing IBM acquired Platform Computing, a leader in cluster, grid, and HPC cloud management software • 20 year history delivering leading management software for technical computing and analytics distributed computing environments • Enable the usage and management of 1000s of systems as 1 - powering the evolution from clusters to grids to HPC clouds • 2000+ global customers including 23 of 30 largest enterprises • Market leading scheduling engine with high performance, mission-critical reliability and extreme scalability • Comprehensive capability footprint from ready-to-deploy complete cluster systems to large global grids • Heterogeneous systems support • Large ISV and global partner ecosystem • Global services and support coverage De facto Standard for Commercial HPC 60% of top Financial Services Over 5 MM CPUs under management x From June 2012, IBM Platform Computing™ portfolio is ready to deploy! 12 IBM Platform Computing can help accelerate your application results For technical computing and analytics distributed computing environments Optimizes Workload Management • • • • Batch and highly parallelized Policy & resource-aware scheduling Service level agreements Automation / workflow Aggregates Resource Pools • • • • Compute & Data intensive apps Heterogeneous resources Physical, virtual, cloud Easy user access Delivers Shared Services • • • • Multiple user groups, sites Multiple applications and workloads Governance Administration/ Reporting / Analytics Transforms Static Infrastructure to Dynamic • • • • Workload-driven dynamic clusters Bursting and “in the cloud” Enhanced self-service / on-demand Multi-hypervisor and multi-boot 13 Clients span many industries Platform LSF x “Platform Computing came to us as a true innovation partner, not a supplier of technology, but a partner who was able to understand our problems and provide appropriate solutions to us, and work with us to continuously improve the performance of our system” - Steve Nevey, Business Development Manager Red Bull Technology Watch Red Bull video Platform Symphony European Bank "Platform Computing and its enterprise grid solution enable us to share a formerly heterogeneous and distributed hardware infrastructure across applications regardless of their location, operating system and application logic, … helping us to achieve our internal efficiency targets while at the same time improving our performance and service quality“ -Lorenzo Cervellin, Head of Global Markets and Treasury Infrastructure UniCredit Global Information Services Platform HPC “Platform’s software was a clear leader from the beginning of the process” -Chris Collins, Head of Research & Specialist Computing University of East Anglia IBM also offers the most widely used, commercially available, technical computing data management software Cluster file system, all nodes access data. GPFS™ Virtualized Access to Data Virtualized, centrally deployed, managed, backed up and grown Seamless capacity and performance scaling IBM General Parallel File System - Scalable, highlyavailable, high performance file system optimized for multi-petabyte storage management GPFS pioneered Big Data management Extreme Scalability Proven Reliability Performance File system 263 files per file system Maximum file system size: bytes 299 Maximum file size equals file system size Production 5.4 PB file system Number of nodes 1 to 8192 No Special Nodes High Performance Metadata Add/remove on the fly Striped Data Nodes Equal access to data Storage Integrated Tiered storage Rolling Upgrades Administer from any node Data replication IBM innovation continues with GPFS Active File Management (AFM) for global namespace GPFS GPFS GPFS GPFS GPFS x GPFS GPFS introduced concurrent file system access from multiple nodes. Multi-cluster expanded the global namespace by connecting multiple sites AFM takes global namespace truly global by automatically managing asynchronous replication of data 1993 2005 2011 Storage Rich Servers NSD/GNRx Legacy HPC Storage Native Raid x GPFS™ AFM Eliminating data islands SNC GNRx MapReduce Farm Legacy NFS Storage How can GPFS deliver value to your business? Speed time-tomarket thru faster analytics Knowledge Management and efficiency thru file sharing x Reduce storage costs thru Life-cycle Management GPFS Maintain business continuity thru Disaster Recovery Business flexibility with Cloud Storage Innovate with Map-Reduce/ Hadoop Speed time-to-market with faster analytics Faster time-tomarket thru faster analytics Knowledge Management and efficiency thru file sharing Reduce storage costs thru Life-cycle Management GPFS Maintain business continuity thru Disaster Recovery Business flexibility with Cloud Storage Innovate with Map-Reduce/ Hadoop • Issue: – We are in the era of “Smarter Analytics” • Data explosion makes I/O a major hurdle. • Deep analytics result in longer running workloads • Demand for lower-latency analytics to beat the competition x • GPFS was designed for complex and/or large workloads accessing lots of data: – Real time disk scheduling and load balancing ensure all relevant information and data can be ingested for analysis – Built-in replication ensures that deep analytics workloads can continue running should a hardware or low level software failure occur. – Distributed design means it can scale as needed Reduce storage costs thru Life-cycle Management Faster time-tomarket thru faster analytics Knowledge Management and efficiency thru file sharing Reduce storage costs thru Life-cycle Management GPFS Maintain business continuity thru Disaster Recovery Business flexibility with Cloud Storage Innovate with Map-Reduce/ Hadoop • Issue: – Increasing storage costs as dormant files sit on spinning disks – Redundant files stored across the enterprise to ease access – Aligning user file requirements with cost of storage x • GPFS has policy-driven, automated tiered storage management for optimizing file location. – ILM tools manage sets of files across pools of storage based upon user requirements – Tiering across different economic classes of storage: SSD, spinning disk, tape – regardless of physical location. – Interface with external storage sub-systems such as TSM and HPSS to exploit ILM capability enterprise-wide. Maintain business continuity thru disaster recovery Faster time-tomarket thru faster analytics Knowledge Management and efficiency thru file sharing Reduce storage costs thru Life-cycle Management GPFS Maintain business continuity thru Disaster Recovery Business flexibility with Cloud Storage Innovate with Map-Reduce/ Hadoop • Issue: – Need for real-time or low latency file access – File data contained in geographic areas susceptible to downtime – Fragmented file based information across a wide geographic area x • GPFS has inherent features that are designed to ensure high availability of file-based data – Remote file replication with built-in failover – Multi-site clustering enables risk reduction of stored data via WAN – Space efficient point-in-time snapshot view of the file system enabling quick recovery Innovate with Big Data or Map-Reduce/Hadoop Faster time-tomarket thru faster analytics Knowledge Management and efficiency thru file sharing Reduce storage costs thru Life-cycle Management GPFS Maintain business continuity thru Disaster Recovery Business flexibility with Cloud Storage Innovate with Map-Reduce/ Hadoop • Issue: – Unlocking value in large volumes of unstructured data – Mission critical applications requiring enterprise-tested reliability – Looking for alternatives to the Hadoop File System (HDFS) for map-reduce applications x • As part of a Research project, there is an active development project called GPFS-SNC to provide a robust alternative to HDFS – HDFS is a centralized file system with a single point of failure, unlike the distributed design of GPFS – GPFS Posix compliance expands the range of application that can access files (read, write, append) vs HDFS which cannot append or overwrite. – GPFS contains all of the rich ILM features for high availability and storage management, HDFS does not. Knowledge management and efficiency thru file sharing Faster time-tomarket thru faster analytics Knowledge Management and efficiency thru file sharing Reduce storage costs thru Life-cycle Management GPFS Maintain business continuity thru Disaster Recovery Business flexibility with Cloud Storage Innovate with Map-Reduce/ Hadoop • Issue: – Geographically dispersed employees need access to same set of file based information – Supporting “follow-the-sun” product engineering and development processes (CAD, CAE, etc) – Managing and integrating the workflow of highly fragmented and geographically dispersed file data generated by employees x • GPFS global name space support and Active File Management provide core capabilities for file sharing – Global namespace enables a common view of files, file location no matter where the file requestor, or file resides. – Active File Management handles file version control to ensure integrity. – Parallel data access allows for large number of files and people to collaborate without performance impact. Intelligent Cluster System x iDataPlex Optimized platforms to right-size your Technical Computing operations IBM leadership for a new generation of Technical Computing Technical Computing is no longer just the domain of large problems – Businesses of all sizes need to harness the explosion of data for business advantage – Workgroups and departments are increasingly using clustering at a smaller scale to drive new insights and better business outcomes – Smaller groups lack the skills and resources to deploy and manage the system effectively x IBM Technical Computing expertise IBM brings experience in supercomputing to smaller workgroup and department clusters with IBM Intelligent Cluster™ – Reference solutions for simple deployment across a range of applications – Simplified end-to-end deployment and resource management with Platform HPC software – Factory integrated and installed by IBM – Supported as an integrated solution – Now even easier with IBM Platform Computing IBM intelligence for clusters of all sizes! IBM Intelligent Cluster™ – it’s about faster time-to-solution Take the time and risk out Technical Computing deployment Building Blocks: Industry-leading IBM and 3rd Party components Cluster Manageme nt IBM Intelligent Cluster OS x Manageme nt Servers Compute Nodes Networkin g Storage Factory-integrated, interoperability-tested system with compute, storage, networking and cluster management tailored to your requirements and supported as a solution! Design Build Test Install Support Allows clients to focus on their business not their IT – that is backed by IBM IBM Intelligent Cluster simplifies large and small deployments Large Small Research LRZ SuperMUC Europe-wide research cluster 9,587 servers, direct-water cooled x University of Chile Earthquake prediction and astronomy 56 servers, air-cooled Media Illumination Entertainment 3D Feature-length movies 800 iDataPlex servers Rear-Door Heat eXchanger cooled Kantana Animation Studios Thailand television production 36 iDataPlex servers, air-cooled Technical Computing Storage Complete, scaleable, dense solutions from a single vendor IBM System Storage® for Technical Computing Middleware/Tools Storage DCS 3700 LTO Tape GPFS SONAS x DS5000 Services Complete, scaleable, integrated solutions from single vendor Scaling to the multi-petabyte and hundreds gigabyte/sec Industry leading data management software and services Big Green features lower overall costs Worldwide support and service IBM’s Densest Storage Solution Just Got Better… IBM System Storage DCS3700 Performance Module 6Gb/s x4 SAS-based storage system Expandable performance, scalability and density starting at entry-level prices • Powerful hardware platform • • • • 2.13GHz quad core processor 12, 24 or 48GB cache / controller pair x 8x base 8Gb FC ports / controller pair Additional host port options via Host Interface Cards • Drastically Improved Performance • Supports up to 360 drives • Fully supports features in recent and upcoming releases • 10.83 feature set • DDP, Enhanced FlashCopy, FlashCopy Consistency Groups, Thin Provisioning, ALUA, VAAI Expanded Capabilities of IBM’s Densest Storage Solution… IBM System Storage DCS3700 now with Performance Module Option 6Gb/s x4 SAS-based storage system Expandable performance, scalability and density starting at entry-level prices • New DCS3700 Performance Controller • High density storage system designed xfor General Purpose Computing and High Performance Technical Computing applications • IBM’s densest disk system: 60 drives and dual controllers in 4U now scales to over 1PB per system with 3TB drives • New Dynamic Disk Pooling feature enables easy to configure Worry-Free storage reducing maintenance requirements and delivering consistent performance • New Thin Provisioning, ALUA, VAAI, Enhanced FlashCopy features deliver increased utilization, higher efficiency, and performance • Superior serviceability and easy installation with front load drawers • Bullet-proof reliability and availability designed to ensure continuous high-speed data delivery The DCS3700 Can Scale In clusters…with IBM GPFS™ • Combining IBM’s GPFS clustered file management software and DCS3700, creates an extremely scalable and dense file-based management system • Using a flexible architecture, “building blocks” of DCS3700+GPFS can be organized Configuration Single Building Block Two Building Blocks 2 GPFS x3650 Servers 3 DCS3700 4 GPFS x3650 Servers 6 DCS3700 360TB 262TB 720TB 524TB Up to 4.8 GB/s Up to 5.5 GB/s Up to 9.6 GB/s Up to 11.0 GB/s 3,600 IOP/s 6,000 IOP/s 7,200 IOP/s 12,000 IOP/s x Capacity: Raw Usable Streaming Rate: Write Read IOP Rate (4K trans.) Write Read Customer Success Stories Applying IBM technology and experience to solve real-world issues and deliver value National Digital Archive (NDA) – Poland Heritage and cultural preservation for society. The Need: NDA needed a cost-effective IT solution that it could use to significantly increase internal efficiencies and archiving capacity. NDA currently holds almost 15 million photographs, 30,000 sound recordings and 2,500 films. It provides free access to these materials. The Solution: The client implement a solution based on IBM Power Systems servers, IBM System Storage devices, and IBM GPFS. To provide scalability for the ongoing work, CompFort Meridian helped NDA implement an IBM Power 750 Express server. Using x this system, the client will be able to rapidly expand the digital archive without impeding the performance of the ZoSIA service. NDA also uses IBM General Parallel File System to conduct online storage management and integrated information lifecycle management and to scale accessibility to the expanding volume of archived material. Using this solution, the client can maintain the performance of the ZoSIA service even when numerous users access the same resource at the same time. The Benefit NDA saved its nearly 290,000 users an estimated $35million by enabling them to check archives online rather than spending the time and money required to visit NDA in person. The client also gained a high-performance, stable and secure solution to support the ZoSIA online archive system. In addition, with this solution in place, NDA can consolidate national and cultural remembrance - and extend a sense of national origin and heritage into the future. Solution components: IBM Power IBM General Parallel File System DEISA Enabling 15 European supercomputing centers to collaborate The need: “Our work with IBM GPFS DEISA wanted to advance European computational science through close collaboration between Europe’s most important supercomputing centers by supporting challenging computing tasks and sharing data across a wide-area network. demonstrates the viability and usefulness of a global file system for collaboration and data-sharing between supercomputing centers— even when the individual supercomputing clusters are based on very different technical architectures. The flexibility of GPFS and its ability to support all the different DEISA supercomputers is highly impressive.” The solution: x architectures to To allow different IBM® and non-IBM supercomputer access data from across a wide-area network, DEISA worked closely with the IBM Deep Computing team to create a global multicluster file system based on IBM General Parallel File System (GPFS™). The benefit: Allows scientists in different countries to share supercomputing resources and collaborate on large-scale projects Enables allocation of specific computing tasks to the most suitable supercomputing resources, boosting performance. Provides a rapid, reliable and secure shared file system for a variety of supercomputing architectures—both IBM and non-IBM. —Dr. Stefan Heinzel, Director of the Rechenzentrum Garching at the Max Planck Society Solution components: IBM Power Systems™ IBM BlueGene®/P IBM PowerPC® Several non-IBM supercomputing architectures including Cray XT5, NEC SX8 and SGI-Altix IBM® General Parallel File System (GPFS™) Snecma HPC to achieve regulatory objectives The Need: To reduce its impact on the environment, Snecma adopts a number of key factors such as reducing fuel consumption and therefore greenhouse gas emissions, reducing noise and choosing environment-friendly materials for manufacturing and maintenance of aviation engines. The company was required to meet the 'Vision 2020' plan set by the European community. The plan defines the European aviation industry’s objectives for 2020, with ambitious environmental objectives, including: - 50% reduction in perceived noise and CO2 releases per passenger-kilometer x to the year 2000. - 80% reduction in nitrogen oxide (NOx) emissions compared To meet these objectives, Snecma needed heavy investment in research and development, powered with supercomputers. The Solution: Snecma implemented a powerful high performance computing (HPC) environment with optimal energy efficiency. The core architectural components were based upon highly dense, low power server cluster packaging, low latency interconnect, and a high performance parallel file system. Thanks to IBM technologies, Snecma gains a powerful and reliable high performance computing solution. The new supercomputer will be used by leading-edge researchers to make highly complex computations in the aviation field. The simulations carried on iDataPlex supercomputer allow Snecma to reduce fuel consumption and therefore greenhouse gas emissions, reduce noise, while addressing data center energy crisis. Solution components: IBM System x iDataPlex IBM General Parallel File SystemTM Vestas Wind Systems Maximize power generation and durability in its wind turbines with HPC The Opportunity What Makes it Smarter This wind technology company relied on the World Research and Forecasting modeling system to run its turbine location algorithms, in a process generally requiring weeks and posing inherent data capacity limitations. Poised to begin the development of its own forecasts and adding actual historical data from existing customers to the mix of factors used in the model, Vestas needed a solution to its Big Data challenge that would be faster, more accurate, and better suited to the its expanding data set. Precise placement of a wind turbine can make a significant difference in the turbine's performance–and its useful life. In the competitive new arena of sustainable energy, winning the business can depend on both value demonstrated in the proposal and the speed of RFP response. Vestas broke free of its dependency on the World Research and Forecasting model with a powerful solution that sliced weeks from the processing time and more than doubled the capacity needed to include all the factors it considers essential for accurately predicting turbine success. Using a supercomputer that is one of the world's largest to-date and a modeling solution designed to harvest insights from both structured and unstructured data, the company can factor in temperature, barometric pressure, humidity, precipitation, wind direction and wind velocity at the ground level up to 300 feet, along with its own recorded data from customer turbine placements. Other sources to be considered include global deforestation metrics, satellite images, geospatial data and data on phases of the moon and tides. The solution raises the bar for due diligence in determining effective turbine placement. x Real Business Results – Reduces from weeks to hours the response time for business user requests – Provides the capability to analyze ALL modeling and related data to improve the accuracy of turbine placement – Reduces cost to customers per kilowatt hour produced and increases the precision of customer ROI estimates Solution Components • IBM Technical Computing – General Parallel File System • IBM InfoSphere® BigInsights Enterprise Edition • IBM System x ®, iDataPlex ® “Today, more and more sites are in complex terrain. Turbulence is a big factor at these sites, as the components in a turbine operating in turbulence are under more strain and consequently more likely to fail. Avoiding these pockets of turbulence means improved cost of energy for the customer." - Anders Rhod Gregersen, Senior Specialist, Plant Siting & Forecasting …best to hear from a Client themselves Please join me in welcoming Ivar Koppel deputy director of research