QuickSpecs NVIDIA Tesla GPU Computing Modules for HP ProLiant Servers Overview HP supports, on select HP ProLiant servers, computational accelerator modules based on NVIDIA® Tesla™ Graphical Processing Unit (GPU) technology. The following Tesla Computing Modules are available from HP, for use in HP ProLiant SL390s servers. NVIDIA Tesla M2050 2-Slot Passive Module NVIDIA Tesla M2070 2-Slot Passive Module NVIDIA Tesla M2075 2-Slot Passive Module NVIDIA Tesla M2070Q 2-Slot Passive Module NVIDIA Tesla M2090 2-Slot Passive Module The NVIDIA Tesla M2070Q module can also be used in HP ProLiant WS460c workstation blades. Based on NVIDIA's CUDA™ architecture, the Tesla M2050, M2070/M2070Q, M2075 and M2090 Computing Modules enable seamless integration of GPU computing with HP ProLiant servers for high-performance computing and large data center, scale-out deployments. The 20-series Tesla GPUs are the first to have a peak performance greater than 10X the double-precision horsepower of a quad-core x86 CPU and the first to deliver ECC memory. The Tesla M2050, M2070/M2070Q, M2075 and M2090 modules deliver all of the standard benefits of GPU computing while enabling maximum reliability and tight integration with system monitoring and management tools such as HP Cluster Management Utility. The Tesla M2070Q uses the NVIDIA Fermi GPU that combines Tesla's high performance computing - found in the Tesla M2050, M2070, M2075 and M2090 cards - and the NVIDIA Quadro® professional-class visualization in the same GPU. The Tesla M2070Q is the ideal solution for customers who want to deploy high performance computing in addition to advanced and remote visualization in the same datacenter. The HP GPU Ecosystem includes HP Cluster Platform specification and qualification, HP-supported GPU-aware cluster software, and also third-party GPU-aware cluster software for NVIDIA Tesla GPU Computing Modules on HP ProLiant Servers. In particular, the HP Cluster Management Utility (CMU) will monitor and display GPU health sensors such as temperature. CMU will also install and provision the GPU drivers and the CUDA software. The HP HPC Linux Value Pack includes a GPU-enhanced version of Platform LSF, with the capability of scheduling jobs based on GPU requirements. The HP HPC Linux Value pack also includes a GPU-enhanced version of HP-MPI, which can set up optimized affinities between specific cores and specific GPUs. DA - 13743 Worldwide — Version 7 — August 30, 2011 Page 1 QuickSpecs NVIDIA Tesla GPU Computing Modules for HP ProLiant Servers Overview What's New Support for NVIDIA Tesla M2075 6 GB Module. The NVIDIA M2075 Module has a peak power consumption of 200 Watts, as compared to the 225 Watt peak power consumption of the other modules. DA - 13743 Worldwide — Version 7 — August 30, 2011 Page 2 QuickSpecs NVIDIA Tesla GPU Computing Modules for HP ProLiant Servers Models NVIDIA Passive Tesla Modules NVIDIA Tesla M2050 3GB Module NOTE: 2-slot passively cooled second-generation Tesla module with 3 GB memory. NVIDIA Tesla M2070Q 6GB GPU Graphics Module NVIDIA Tesla M2070 6GB Module NVIDIA Tesla M2075 6GB Computational Module AMD Firestream FS9350 Compute Accelerator NOTE: 2-slot passively cooled second-generation Tesla modules with 6 GB memory. NOTE: See the HP ProLiant SL390s Generation 7 (G7) server or HP ProLiant WS460c (G6) Workstation Blade QuickSpecs for configuration details: http://h18004.www1.hp.com/products/quickspecs/13713_div/13713_div.html http://h18004.www1.hp.com/products/quickspecs/13429_div/13429_div.html DA - 13743 Worldwide — Version 7 — August 30, 2011 SH885B A0C39A SH886A A0R41A A0K01A Page 3 QuickSpecs NVIDIA Tesla GPU Computing Modules for HP ProLiant Servers Standard Features M2050, M2070/M2070Q, M2075 and M2090 Computing Modules Performance of M2050, M2070/M2070Q and M2075 Computing Modules 448 CUDA cores 515 Gigaflops of double-precision peak performance in each GPU Single precision peak performance is over one Teraflop per GPU. GDDR5 memory optimizes performance and reduces data transfers by keeping large data sets in local memory (6 GB on the M2070 and M2075 modules) that is attached directly to the GPU. The NVIDIA Parallel DataCache™ accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand. This includes a configurable L1 cache per Streaming Multiprocessor block and a unified L2 cache for all of the processor cores. The NVIDIA GigaThread™ Engine maximizes the throughput by faster context switching that is 10X faster than the M1060 module, concurrent kernel execution, and improved thread block scheduling. Asynchronous transfer turbo charges system performance by transferring data over the PCIe bus while the computing cores are crunching other data. Even applications with heavy data-transfer requirements, such as seismic processing, can maximize the computing efficiency by transferring data to local memory before it is needed. The high speed PCIe Gen 2.0 data transfer maximizes bandwidth between the HP ProLiant server and the Tesla processors. Performance of the M2090 Computing Module 512 CUDA cores 655 Gigaflops of double-precision peak performance in each GPU 1330 Gigaflops of single-precision peak performance in each GPU. GDDR5 memory optimizes performance and reduces data transfers by keeping large data sets in 6 GB of local memory that is attached directly to the GPU. The NVIDIA Parallel DataCache™ accelerates algorithms such as physics solvers, ray-tracing, and sparse matrix multiplication where data addresses are not known beforehand. This includes a configurable L1 cache per Streaming Multiprocessor block and a unified L2 cache for all of the processor cores. The NVIDIA GigaThread™ Engine maximizes the throughput by faster context switching that is 10X faster than the M1060 module, concurrent kernel execution, and improved thread block scheduling. Asynchronous transfer turbo charges system performance by transferring data over the PCIe bus while the computing cores are crunching other data. Even applications with heavy data-transfer requirements, such as seismic processing, can maximize the computing efficiency by transferring data to local memory before it is needed. The high speed PCIe Gen 2.0 data transfer maximizes bandwidth between the HP ProLiant server and the Tesla processors. Reliability ECC Memory meets a critical requirement for computing accuracy and reliability for datacenters and supercomputing centers. It offers protection of data in memory to enhance data integrity and reliability for applications. Register files, L1/L2 caches, shared memory, and DRAM all are ECC protected. Double-bit errors are detected and can trigger alerts with the HP Cluster Management Utility. Also, the Platform LSF job scheduler, available as part of HP HPC Linux Value Pack, can be configured to report when jobs encounter double-bit errors. Passive heatsink design eliminates moving parts and cables reduces mean time between failures. Programming and Management Ecosystem The CUDA programming environment has broad support of programming languages and APIs. Choose C, C++, OpenCL, DirectCompute, or Fortran to express application parallelism and take advantage of the innovative "Fermi" architecture. The CUDA software, as well as the GPU drivers, can be automatically installed on HP ProLiant servers, by HP Cluster Management Utility. "Exclusive mode" enables application-exclusive access to a particular GPU. CUDA environment variables enable cluster DA - 13743 Worldwide — Version 7 — August 30, 2011 Page 4 QuickSpecs NVIDIA Tesla GPU Computing Modules for HP ProLiant Servers Standard Features management software such as the Platform LSF job scheduler (available as part of HP HPC Linux Value Pack) to limit the Tesla GPUs an application can use. With HP ProLiant servers, application programmers can control the mapping between processes running on individual cores, and the GPUs with which those processes communicate. By judicious mappings, the GPU bandwidth, and thus overall performance, can be optimized. The technique is described in a white paper available to HP customers at: www.hp.com/go/hpc. A heuristic version of this affinity-mapping has also been implemented by HP as an option to the mpirun command as used for example with HP-MPI, available as part of HP HPC Linux Value Pack. GPU control is available through the nvidia-smi tool which lets you control compute-mode (e.g. exclusive), enable/disable/report ECC and check/reset double-bit error count. IPMI and iLO gather data such as GPU temperature. HP Cluster Management Utility has incorporated these sensors into its monitoring features so that cluster-wide GPU data can be presented in real time, can be stored for historical analysis and can be easily used to set up management alerts. Supported Operating Systems RHEL 5 RHEL 6 SLES 11 Windows Server 2008 Supported Servers and Workstation Blades HP ProLiant SL390s G7 (M2050, M2070/M2070Q, M2075, M2090) NOTE: The ambient temperature for SL390s 2U systems with between one and three NVIDIA M2090 GPUs, must be 30 degrees Celsius or less. The ambient temperature for SL390s 4U systems with between five and eight NVIDIA M2090 GPUs, must be 30 degrees Celsius or less. All other SL390s systems may be operated with ambient temperatures up to 35 degrees Celsius. HP ProLiant WS460c G6 (M2070Q only) HP Services and Support The NVIDIA Tesla GPU Computing Module has one year for parts exchange only or the warranty of the server or chassis it is attached to and for which it is qualified. Enhancements to warranty services are available for server and chassis through Flexible Care Pack services. NOTE: For more information, visit HP Care Pack Services at: http://www.hp.com/services DA - 13743 Worldwide — Version 7 — August 30, 2011 Page 5 QuickSpecs NVIDIA Tesla GPU Computing Modules for HP ProLiant Servers Optional Features HP High Performance Clusters HP Cluster Platforms The NVIDIA Tesla GPU Computing Modules are optional components of the HP Cluster Platforms - specifically engineered, factory-integrated large-scale ProLiant clusters optimized for High Performance Computing, with a choice of servers, networks and software. Operating system options include specially priced offerings for Red Hat Enterprise Linux and Novell SLES, as well as Microsoft Windows HPC Server. Compliance to the HP Cluster Platform specification is verified using the HP Cluster Test diagnostic suite, which includes GPU diagnostics. A Cluster Platform Configurator simplifies ordering. [http://www.hp.com/go/clusters.] HP HPC Interconnects High Performance Computing (HPC) interconnect technologies are available for ProLiant servers as part of the HP Cluster Platform portfolio. These highspeed InfiniBand and Gigabit interconnects are fully supported by HP when integrated within an HP cluster. Flexible, validated solutions can be defined with the help of configuration tools. [http://www.hp.com/techservers/clusters/ucp/index.html] HP Cluster Management HP Cluster Management Utility (CMU) is an HP-licensed and HP-supported Utility suite of tools that are used to manage large-scale Linux ProLiant systems. CMU includes software for the centralized provisioning, management and monitoring of nodes as well as the HP Tesla GPU Computing Modules. CMU makes the administration of clusters user friendly, efficient, and effective. [http://www.hp.com/go/cmu] HP HPC Linux Value Pack HP HPC Linux Value Pack (Value Pack) is an HP-licensed and HP-supported specially priced software bundle for the development and deployment of applications on HPC Cluster Platforms. Value Pack includes the Platform HPC Enterprise Edition suite of tools including the LSF workload scheduler and the HP-MPI parallelization library. Also included are the HP Unified Parallel C compiler and the HP Shmem library, as well as the execution environments for the libraries and compiler. HP HPC Linux Value Pack Third Party GPU Cluster and Development Software More software for applications and development tools for general purpose GPU enabled systems are available every week. Examples of software available for various vendors are listed below. PGI Accelerator: Fortran and C Compilers (directive-based generation of CUDA code, and additionally a CUDA Fortran compiler) CAPS HMPP C and Fortran to CUDA C Compiler (directive-based generation of CUDA code) TotalView Dynamic Source Code and Memory Debugging for C, C++ and FORTRAN HPC Applications Allinea DDT Distributed Debugging Tool Wolfram Mathematica mathematical analysis software Altair PBS Professional workload scheduler Platform LSF workload scheduler and Platform Cluster Manager Adaptive Computing Moab Cluster Suite Microsoft Windows HPC Server 2008 DA - 13743 Worldwide — Version 7 — August 30, 2011 Page 6 QuickSpecs NVIDIA Tesla GPU Computing Modules for HP ProLiant Servers Related Options HP High Performance Cluster Models HP Cluster Management Utility Compute Node Flexible License NOTE: This part number can be used to purchase one certificate for multiple licenses with a single activation key. Each license is for one node (server). Customer will receive a printed end user license agreement and license entitlement certificate via physical shipment. The license entitlement certificate must be redeemed online in order to obtain a license key. NOTE: For additional license kits please see the QuickSpecs at: http://h18004.www1.hp.com/products/quickspecs/12612_div/12612_div.html HP Cluster Management Utility License and Media NOTE: Order a minimum of one license per cluster to purchase media including software and documentation, which will be delivered to the customer, and also licenses CMU management. No license key is delivered or required. NOTE: For additional license kits please see the QuickSpecs at: http://h18004.www1.hp.com/products/quickspecs/12612_div/12612_div.html HP High Performance Computing Linux Value Pack 1 Processor Flexible License NOTE: This part number can be used to purchase one certificate for multiple licenses with a single activation key. Each license is for one socket (a.k.a. processor). Customer will receive a printed end user license agreement and license entitlement certificate via physical shipment. The license entitlement certificate must be redeemed online in order to obtain a license key. NOTE: For additional license kits please see the QuickSpecs at: http://h18004.www1.hp.com/products/quickspecs/13485_div/13485_div.html HP High Performance Computing Linux Value Pack Media Kit NOTE: This part number can be used to purchase media including software and documentation, which will be delivered to the customer. NOTE: For additional license kits please see the QuickSpecs at: http://h18004.www1.hp.com/products/quickspecs/13485_div/13485_div.html DA - 13743 Worldwide — Version 7 — August 30, 2011 QL803A 433257-B21 TC293B TC294A Page 7 QuickSpecs NVIDIA Tesla GPU Computing Modules for HP ProLiant Servers Technical Specifications Form Factor Number of Tesla GPUs Double Precision floating point performance (peak) 9.75 in (24.8 cm) PCIe x16 form factor 1 515 Gflops Tesla M2050, M2070, M2070Q, M2075 655 Gflops Tesla M2090 Single Precision floating Tesla M2050, M2070, 1.03 Tflops point performance (peak) M2070Q, M2075 1.33 Tflops Tesla M2090 Total Dedicated Memory Tesla M2050 3GB GDDR5 Tesla M2070, M2070Q, 6GB GDDR5 M2075 6GB GDDR5 Tesla M2090 Memory Interface 384-bit 148 GB/sec Memory Bandwidth Tesla M2050, M2070, M2070Q, M2075 178 GB/sec Tesla M2090 Power Consumption 200W TDP Tesla M2075 225W TDP Tesla M2050, M2070, M2070Q 250W TDP Tesla M2090 System Interface PCIe x16 Gen2 Thermal Solution Passive heatsink cooled by host system airflow Environment-friendly Products and Approach End-of-life Management and Recycling Hewlett-Packard offers end-of-life HP product return, trade-in, and recycling programs in many geographic areas. For trade-in information, please go to: http://www.hp.com/go/green. To recycle your product, please go to: http://www.hp.com/go/green or contact your nearest HP sales office. Products returned to HP will be recycled, recovered or disposed of in a responsible manner. The EU WEEE directive (2002/95/EC) requires manufacturers to provide treatment information for each product type for use by treatment facilities. This information (product disassembly instructions) is posted on the Hewlett Packard web site at: http://www.hp.com/go/green. These instructions may be used by recyclers and other WEEE treatment facilities as well as HP OEM customers who integrate and re-sell HP equipment. DA - 13743 Worldwide — Version 7 — August 30, 2011 Page 8 QuickSpecs NVIDIA Tesla GPU Computing Modules for HP ProLiant Servers Technical Specifications © Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. Windows and Microsoft are registered trademarks of Microsoft Corp., in the U.S. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. DA - 13743 Worldwide — Version 7 — August 30, 2011 Page 9