SGI ® Many core computing Solutions Gábor Lehoczki Silicon Computers Kft. leho@silicon.hu 1 Nvidia Tesla series Features Number and Type of GPU Peak double precision floating point performance Peak single precision floating point performance M2075 M2090 K20 K20X 1 Fermi GPU 1 Fermi GPU 2 Kepler GK104s 515 665 190 Gigaflops 1.17 1.31 Gigaflops Gigaflops (95 Gflops per GPU) Tflops Tflops 3.52 3.95 Tflops Tflops 208 250 GB/sec GB/sec 5 GB 6 GB 2496 2688 1030 1331 Gigaflops Gigaflops Memory bandwidth 150 177 (ECC off) GBytes/sec GBytes/sec Memory size (GDDR5) 6 GB 6 GB 448 512 CUDA cores K10 4577 Gigaflops (2288 Gflops per GPU) 320 GB/sec (160 GB/sec per GPU) 8 GB (4 GB per GPU) 3072 (1536 per GPU) 2 1 Kepler GK110 Intel® Xeon Phi™ Coprocessor 5110P 3120A 3120P Code Name # of Cores Clock Speed Peak double precision floating point performance Peak single precision floating point performance Cache 7120P 7120X 5120D Knights Corner 60 1.053 GHz 57 1.1 GHz 57 1.1 GHz 61 61 60 1.238 GHz 1.238 GHz 1.053 GHz 1.011 TFlops 2.022 TFlops 30 MB 28.5 MB 28.5 MB 30.5 MB 30.5 MB 30 MB Max TDP 225 W 300 W 300 W 300 W 300 W 245 W Max Memory Size 8 GB 6 GB 6 GB 16 GB 16 GB 8 GB 5 GB/s 5 GB/s 5.5 GB/s 5.5 GB/s 5.5 GB/s (16 channels) Max Memory Bandwidth 320 GB/s 3 SGI Development Suite • • For Linux software development Supports 2-4 developers SGI Performance Suite Intel Composer XE • Technical Computing Performance for SGI Systems Rogue Wave TotalView Team • Software development tool suite tuned for application performance and code robustness • Consists of SGI Accelerate, SGI MPI, SGI REACT, SGI UPC • Dynamic source code, thread, and memory debugging for C, C++ and Fortran HPC applications • Consists of C++ & Fortran compilers, Math Kernel Library, Integrated Performance Primitives, Threading Building Blocks • Tech support via SGI Support contract • Consists of TotalView, MemoryScape, ReplayEngine, and CUDA debugging • Includes 1 yr tech support • Includes 1 yr tech support 4 4 Intel® Xeon Phi™ compared to a GPU • Intel® Xeon Phi™ is a multi-core architecture with the following characteristics : – based on Pentium 4 in-order cores with a vector extension – OpenMP programming – “UV on a chip” – PCIe x16 Gen 2 (~ 6.7 GB/s) – Ease-of-Use • NVIDIA Fermi GPU use streaming multi-processors with currently 32 SIMD engines – A very high number of threads (10s of thousands) hides latency to memory – Minimal context switching time between threads – HW thread dispatcher – PCIe x16 Gen (~ 11.3 GB/s) [Kepler 10] 5 GPU-ACCELERATED APPLICATIONS 01 Research: Higher Education and Supercomputing CASINO Code for performing Quantum Monte Carlo GPUs to determine protein binding sites COMPUTATIONAL CHEMISTRY AND BIOLOGY (QMC) electronic structure calculations for Allows fast processing of large ligand NUMERICAL ANALYTICS finite and periodic systems databases PHYSICS TBD Yes Single only 06 Defense and Intelligence CP2K Program to perform atomistic and BUDE Molecular docking program Empirical Free 06 Computational Finance molecular simulations of solid state, liquid, Energy Forcefield Single only 07 Manufacturing: CAD and CAE molecular and biological systems Core Hopping Rapid screening of novel cores to COMPUTATIONAL FLUID DYNAMICS DBCSR (space matrix multiply library) Yes improve COMPUTATIONAL STRUCTURAL MECHANICS GAMESS-UK The general purpose ab initio drug properties COMPUTER AIDED DESIGN molecular GPU accelerated application Yes ELECTRONIC DESIGN AUTOMATION electronic structure program for performing FastROCS Molecule shape comparison application 09 Media and Entertainment SCF-, DFT- and MCSCF-gradient Real-time shape similarity searching/ ANIMATION, MODELING AND RENDERING calculations comparison COLOR CORRECTION AND GRAIN (ss|ss) type integrals within calculations Yes MANAGEMENT using Hartree-Fock ab initio methods Interactive Molecule COMPOSITING, FINISHING AND EFFECTS and density functional theory. Supports Visualizer EDITING organics and inorganics. Molecular visualization based on a raytracing engine ENCODING AND DIGITAL DISTRIBUTION Yes Written for use only on GPUs Yes ON-AIR GRAPHICS GAMESS-US Computational chemistry suite used to Molegro Virtual Docker 6 Method for performing high ON-SET, REVIEW AND STEREO TOOLS simulate atomic and molecular electronic accuracy SIMULATION structure flexible molecular docking WEATHER AND CLIMATE FORECASTING Libqc with Rys Quadrature Algorithm, Energy grid computation, pose evaluation 13 Oil and Gas Hartree-Fock, MP2 and CCSD and guided differential evolution POPULAR GPU-ACCELERATED APPLICATIONS Yes Single only CATALOG | MAR14 | 01 Gaussian PIPER Protein Docking Protein-protein docking Research: Higher Education and Supercomputing (In development) program Molecule docking TBD COMPUTATIONAL CHEMISTRY AND BIOLOGY Predicts energies, molecular structures, PyMol User-sponsored molecular visualization Molecular Dynamics and vibrational frequencies of molecular system on an open-source foundation APPLICATION DESCRIPTION SUPPORTED systems Lines: 460% increase FEATURES MULTI-GPU SUPPORT Joint NVIDIA, PGI and Gaussian Cartoons: 1246% increase ACEMD GPU simulation of molecular mechanics collaboration Surface: 1746% increase force fields, implicit and explicit solvent Yes Spheres: 753% increase Written for use only on GPUs Yes GPAW Real-space grid DFT code written in C and Ribbon: 426% increase AMBER Suite of programs to simulate molecular Python Single only dynamics on biomolecule Electrostatic poisson equation, VEGA ZZ Molecular Modeling Toolkit Virtual logP, PMEMD Explicit Solvent and GB Implicit orthonormalizing of vectors, residual molecular surface values Single only Solvent minimization method (rmm-diis) VMD Visualization and analyzing large biomolecular Yes Yes systems in 3-D graphics CHARMM MD package to simulate molecular Jaguar A high-performance ab initio package High quality rendering, large structures dynamics on biomolecule for performing gas and solution phase (100M atoms), analysis and visualization Implicit (5x), Explicit (2x) Solvent simulations tasks, multiple GPU support for display of via OpenMM. Native CUDA port in TBD Yes molecular orbitals development LATTE Density matrix computations CU_BLAS, SP2 Yes Yes Algorithm Yes Bioinformatics DESMOND High-speed molecular dynamics MOLCAS Methods for calculating general electronic APPLICATION DESCRIPTION SUPPORTED simulations of biological systems on structures in molecular systems in both FEATURES MULTI-GPU SUPPORT conventional commodity clusters. ground and excited states BarraCUDA Sequence mapping software Alignment The code uses novel parallel algorithms CU_BLAS Single only of short sequencing reads, and numerical techniques to achieve high MOLPRO Used for accurate ab initio quantum alignment of indels with gap openings and performance and accuracy chemistry calculations extensions Yes Density-fitted MP2 (DF-MP2), density fitted Yes DL-POLY Simulate macromolecules, polymers, ionic local correlation methods (DF-RHF, DFKS), DFT CUDASW++ Open source software for Smithsystems, etc on a distributed memory Yes Waterman parallel computer MOPAC2013 Semiempirical Quantum Chemistry protein database searches on GPUs Two-body forces, Link-cell pairs, Ewald Pseudodiagonalization, full diagonalization, Parallel search of Smith-Waterman SPME forces, Shake VV and density matrix assembling database Yes Single only Yes Folding@Home A distributed computing project that NWChem Calculations Triples part of Reg-CCSD(T), CUSHAW Parallelized short read aligner Parallel, studies CCSD and accurate long read aligner for protein folding, misfolding, aggregation, and EOMCCSD task schedulers large genomes related diseases Yes Yes Powerful distributed computing molecular Octopus Used for ab initio virtual experimentation 04 | POPULAR GPU-ACCELERATED dynamics system; implicit solvent and and quantum chemistry calculations APPLICATIONS CATALOG | MAR14 folding Full GPU support for ground-state, GATK The Genome Analysis Toolkit or GATK Yes real-time calculations; Kohn-Sham is a software package developed at the GPUGrid.net A distributed computing project that Hamiltonian, orthogonalization, subspace Broad Institute to analyse next-generation uses diagonalization, poisson solver, time resequencing data. The toolkit offers a wide GPUs for molecular simulations propagation variety of tools, with a primary focus on High-performance all-atom biomolecular yes variant discovery and genotyping as well as simulations; explicit solvent and binding Q-CHEM Computational chemistry package designedstrong emphasis on data quality assurance. Yes for HPC clusters Variant Calling; > 70X speedup Yes GROMACS Simulation of biochemical molecules with Various features including RI-MP2 TBD GPU-BLAST Local search with fast k-tuple heuristic complicated bond interactions QUICK QUICK is a GPU-enabled ab intio quantum Protein alignment according to BLASTP Single only Implicit (5x), Explicit (2x) Solvent Yes chemistry software package mCUDA-MEME Ultrafast scalable motif discovery HALMD Large-scale simulations of simple and Running Hartree-Fock and DFT energy on algorithm complex liquids GPU, Supports s, p, d, f orbitals on energy based on MEME Simple fluids and binary mixtures (pair calculation, HF gradient with s,p,d orbital Scalable motif discovery algorithm based potentials, high-precision NVE and NVT, support, GPU-based ERI generator on MEME dynamic correlations) Yes Yes Single only POPULAR GPU-ACCELERATED APPLICATIONS MUMmerGPU High-throughput local sequence HOOMD-Blue Particle dynamics package written CATALOG | MAR14 | 03 alignment grounds TeraChem Quantum chemistry software designed to program up for GPUs run on NVIDIA GPU Aligns multiple query sequences against Written for use only on GPUs Yes Full GPU-based solution; Performance reference sequence in parallel LAMMPS Classical molecular dynamics package compared to GAMESS CPU version TBD Lennard-Jones, Gay-Berne, Tersoff, and Yes NVBIO NVBIO is an open source C++ library many more potentials Materials Science of reusable components designed to Yes APPLICATION DESCRIPTION SUPPORTED accelerate bioinformatics applications using NAMD Designed for high-performance simulation FEATURES MULTI-GPU SUPPORT CUDA. of large molecular systems gWL-LSMS Materials code for investigating the Data structures, algorithms, and utility Full electrostatics with PME and most effects routines useful for building complex simulation features; 100M atom capable of temperature on magnetism computational genomics applications on Yes Generalized Wang-Landau method Yes CPU-GPU systems OpenMM Library and application for molecular PEtot First principles materials code that Yes dynamics for HPC with GPUs computes the behavior of the electron NVBowtie A largely complete implementation of the Implicit and explicit solvent, custom forces Yes structures of materials Bowtie2 aligner on top of NVBIO Test Drive the World’s Density functional theory (DFT) plane wave Good coverage of Bowtie2 features and Fastest Accelerator – Free! pseudopotential calculations comparable quality results Take the GPU Test Drive, a free and easy way to Yes Yes experience accelerated computing on GPUs. You QMCPACK Solves the many-body Schrodinger REACTA A modified version of GCTA with improved can equation computational performance, support for run your own application or try one of the preloaded for electronic structures using a quantum Graphics Processing Units (GPUs), and ones, all running on a remote cluster. Try it today. Monte Carlo method additional features. The purpose of REACTA www.nvidia.com/gputestdrive Main features Yes is to quantify the contribution of genetic 02 | POPULAR GPU-ACCELERATED Quantum Espresso/ variation to phenotypic variation for complex APPLICATIONS CATALOG | MAR14 PWscf traits. Quantum Chemistry An integrated suite of computer codes GRM creation, REML analysis, Regional APPLICATION DESCRIPTION SUPPORTED for electronic structure calculations and Heritability (including multi-GPU) FEATURES MULTI-GPU SUPPORT materials modeling at the nanoscale Yes Abinit Allows to find total energy, charge density PWscf package: linear algebra (matix SeqNFind SeqNFind® is a powerful tool suite that and electronic structure of systems made of multiply), explicit computational kernels, addresses the need for complete and electrons and nuclei within DFT 3D FFTs accurate alignments of many small Local Hamiltonian, non-local Hamiltonian, Yes sequences against entire genomes utilizing LOBPCG algorithm, diagonalization/ VASP First principles materials code that a unique hardware/software cluster system orthogonalization. Contains BigDFT. computes the behavior of electronic for facilitating bioinformatics research in Yes structures based on quantum theory Next Generation sequencing and genomic ACES III Takes best features of parallel Hybrid Hartree-Fock DFT functionals comparisons. implementations of quantum chemistry including exact exchange Hardware and softare for reference methods for electronic structure Yes assembly, BLAST, SW HMM, and de novo Integrating scheduling GPU into SIAL Visualization and Docking Software assembly programming language and SIP runtime APPLICATION DESCRIPTION SUPPORTED Yes environment FEATURES MULTI-GPU SUPPORT SOAP3 GPU-based software for aligning short Yes Amira 5 A multifaceted software platform reads with a reference sequence.It can find ADF Density Functional Theory (DFT) software for visualizing, manipulating, and all alignments with k mismatches, where k package that enables first-principles understanding Life Science and bio-medical is chosen from 0 to 3 electronic structure calculations data Short read alignment tool that is not Fock Matrix, Hessians Yes 3D visualization of volumetric data and heurisitic based, reports all answers BigDFT Implements density functional theory surfaces Yes by solving the Kohn-Sham equations Single only SOAP3-dp SOAP3-dp: Ultra-fast GPU-based tool for describing the electrons in a material BINDSURF A virtual screening methodology that short read alignment via index-assisted DFT; Daubechies wavelets, part of Abinit Yes uses dynamic programming Borrows-Wheeler Transformation, only calculations of a wide variety of pricing and iray rendering in Maximus supported GPU simulation using Element 3D Accuweather Storyteller Weather graphics Real-time Dynamic Programming, Multi-GPU support Intuvision Panoptes 3.0 Video Analytics Object risk measures on a broad range of asset Yes CUDA 3D object based particle system Faster effects Yes Single only Yes recognition and change detection Yes classes and derivatives Inventor 3D mechanical design, documentation, and Single only POPULAR GPU-ACCELERATED APPLICATIONS ASUCA Regional atmospheric model Entire model UGENE Opensource Smith-Waterman for SSE/ LuciadLightspeed Geospatial Visualization and Existing models code in C# supported product simulation Maxon Cinema 4D 3D modeling, animation, and CATALOG | MAR14 | 11 Yes CUDA, Suffix array based repeats finder and Analysis Geospatial Situational Awareness Single transparently, with minimal code changes, Uses BIM for intelligent building rendering Increased model complexity at interactive Video Copilot Optical CAM - SE Global atmosphere model for climate dotplot only Supports multiple backends including components to improve design accuracy rates Flares research Fast short read alignment Yes MotionDSP Ikena ISR Real-time Full Motion Video CUDA and OpenCL, Switches transparently Single only Single only Lens flares plug-in for After Effects Faster effects Dynamical core Yes WideLM Fits numerous linear models to a fixed and WAMI between multiple GPUs and CPUS Revit Building Information Modeling (BIM) for NewTek Lightwave 3D modeling, animation, and Single only COSMO Regional atmospheric model Entire model design and response enhancement and analytics depending on the deal support and load structural engineering, and construction rendering Increased model complexity at interactive Video Copilot Twitch Video effects plug-in for After Yes Parallel linear regression on multiple Real-time super-resolution-based video factors Modeling (BIM) to design, build, and rates Effects Faster effects Single only GEOS-5 Global climate model Entire model Yes similarly-shaped models enhancement, filtering, mosaicing, video Yes maintain higher-quality, more energyefficient Single only EDITING HOMME Dynamical core for global atmospheric Yes analytics, and transcoding Manufacturing: CAD and CAE buildings Otoy Octane Render GPU Renderer CUDA-based APPLICATION DESCRIPTION SUPPORTED model POPULAR GPU-ACCELERATED APPLICATIONS Yes COMPUTATIONAL FLUID DYNAMICS Single only GPU final-frame rendering Yes FEATURES MULTI-GPU SUPPORT Dynamical core Yes CATALOG | MAR14 | 05 Nervve ViD SrX Video Analytics Object recognition APPLICATION DESCRIPTION SUPPORTED NVIDIA Iray A ready-to-integrate, physically-based, Pixologic Sculptris 3D sculpting Increased model Adobe Photoshop CC Image editing Over 30 effects HYCOM Ocean circulation model Dynamical core NUMERICAL ANALYTICS and tracking Yes FEATURES MULTI-GPU SUPPORT photorealistic rendering solution complexity at interactive for smoother image Single only APPLICATION DESCRIPTION SUPPORTED NerVve Visual Search Altair AcuSolve General purpose CFD software Iray Interactive; Iray Photoreal; Iray rates manipulation in Mercury Graphics Engine Meteo Earth Weather graphics Real-time Single only FEATURES MULTI-GPU SUPPORT Solution (NVSS) Linear equation solver Yes Cluster. Fast interactive ray tracing; Single only Single only MITgcm Ocean circulation model Dynamical core ArrayFire Comprehensive GPU function library Video/Image Live and Forensic Search Video and ANSYS Fluent General purpose CFD software Physically-based, global-illumination Redshift Renderer GPU-accelerated, biased renderer Adobe Premiere Pro CC Video editing Mercury Single only Hundreds of functions for math, signal/ Image content search Yes Radiation heat transfer model, linear rendering; Distributed cluster rendering CUDA-based GPU final-frame rendering Yes Playback Engine for real-time NEMO Ocean circulation model Entire model (GYRE image processing, statistics, and more. OpCoast SNEAK Electromagnetic signals equation solver Yes Side Effects Houdini 3D modeling, animation, and video editing & accelerated rendering config) Yes Yes propagation Yes Dassault Systemes CATIA rendering Maximus-supported GPU simulation using Yes NIM Dynamical core for global atmospheric Mathematica Wolfram A symbolic technical modeling for complex urban and terrain Autodesk Moldflow Plastic mold injection software - Live Rendering OpenCL Apple Final Cut Pro Video editing Faster effects model computing language environments Linear equation solver Single only Photorealistic rendering Interactive. Fully integrated Single only Single only Dynamical core Yes and development environment Ray tracing, DTED and remote sensing CPFD Barracuda-VR and in CATIA V6. The Foundry Mari 3D paint Increased model Avid Media Composer Video editing Faster video Weather Central Fusion Development environment for CUDA and inputs Barracuda Network rendering complexity at interactive effects, unique stereo 3D Studio OpenCL Yes Fluidized bed modeling software Linear equation Yes rates capabilities Weather graphics Real-time Single only Yes PCI Geomatics GXL Geospatial Visualization Image solver, particle Bunkspeed Suite Easy to use photorealistic renderingSingle only Single only WRF Regional numerical weather prediction MATLAB by Mathworks GPU acceleration for orthorectification and additional calculations software 10 | POPULAR GPU-ACCELERATED Grass Valley Edius Video editing Faster effects model MATLAB (high-level image processing Single only Iray-based ray-tracing. Animation support. APPLICATIONS CATALOG | MAR14 Single only WSM5, WSM3, Ice Microphysics models Yes technical computing language) Yes FluiDyna Culises for Network rendering COLOR CORRECTION AND GRAIN Harris Velocity Video editing Faster effects Single WSI TrueView Max Weather graphics Real-time Support for 200+ of most used MATLAB GAIA Multi-GPU, Multi-Machine distributed OpenFOAM Yes MANAGEMENT only Single only functions (incl. Signal Processing, Image object store providing SQL style query Solver library for general purpose CFD RTT DeltaGen Redefines high-end 3D visualization APPLICATION DESCRIPTION SUPPORTED Quantel Qube Broadcast video editing Faster video POPULAR GPU-ACCELERATED APPLICATIONS Processing, Communications Systems, etc) capability, advanced geospatial query software and FEATURES MULTI-GPU SUPPORT effects, unique stereo 3D CATALOG | MAR14 | 13 Yes capability,heatmap generation, and Linear equation solvers Yes realtime interaction. This latest version Adobe SpeedGrade CC Color grading Real-time capabilities Oil and Gas NMath Premium GPU-accelerated math and distributed rasterization services FluiDyna LBultra General purpose CFD software gives users a broad suite of robust new grading and finishing with Single only APPLICATION DESCRIPTION SUPPORTED statistics for Built for GPUs, scales to many machines Lattice-Boltzmann solver Yes features to truly revolutionize processes Lumetri Deep Color Engine Sony Vegas Pro Video editing Faster video effects FEATURES MULTI-GPU SUPPORT .NET, automatically detects the presence and many cards without end user query Prometech Particleworks Particle-based CFD and help increase visual quality, speed, and Single only and encoding Single only Acceleware of a CUDA-enabled GPU at runtime modification or pre-meditation software Implicit and explicit solvers Single only flexibility. Assimilate Scratch Color grading and finishing ENCODING AND DIGITAL DISTRIBUTION AxRTM and seamlessly redirects appropriate Yes Vratis ARAEL General purpose CFD software based Interactive ray tracing and global Accelerated debayering for real-time APPLICATION DESCRIPTION SUPPORTED AxKTM computations to it. Computational Finance on illumination. Integration with Siemens digital finishing FEATURES MULTI-GPU SUPPORT Seismic Processing RTM, Kirchhoff, control source, Automatically offloads computations to the APPLICATION DESCRIPTION SUPPORTED FVM with OpenFOAM compatibility TeamCenter. Cluster support Realtime Single only Cinnafilm Tachyon Standards conversion Video electromagnetism, forward modeling GPU FEATURES MULTI-GPU SUPPORT Linear equation solver Yes & Offline Production Process Integration Blackmagic DaVinci procession and encoding Yes Yes Single only Aaon Benfield Turbostream Ltd. CFD software for turbomachinery and scene building. Scene Analysis, Xplore Resolve Digimetrics Aurora Automated video and audio test CGGV GeoVation Seismic Processing Multiple PHYSICS Pathwise™ flows Explicit solver Yes DeltaGen, SDK for DeltaGen Color grading Real-time color correction, real-time and algorithms (RTM, etc) Yes APPLICATION DESCRIPTION SUPPORTED Specialized platform for real-time hedging, Vratis SpeedIT extreme Yes denoising measurment ffA Geoteric Seismic Interpretation Attributes FEATURES MULTI-GPU SUPPORT valuation, pricing and risk management for OpenFOAM PTC Creo Parametric Parametric design solution Yes Video encoding and video processing Single only calculations, geobodies Chroma Lattice Quantum Chromodynamics (LQCD) Spreadsheet-like modeling interfaces, Solver library for general purpose CFD suite Anti-aliasing, better lighting and enhanced Cinnafilm Dark Energy Application and plug-in for Elemental Live Live streaming video processing and extraction Wilson-clover fermions, Krylov solvers, Python-based scripting environment and software shaded-with-edges mode. GRID Support image encoding Yes Domain-decomposition Grid middleware Linear equation solvers Yes Single only enhancement Video encoding and video processing Yes ffA SVI Pro Seismic Interpretation Attributes Yes Yes RESEARCH CFD DEVELOPMENTS Siemens PLM NX and Image de-noising and restoration. Elemental Server File-based video processing and calculations, geobodies ENZO 3D block-structured AMR code for Hanweck Associates Real-time options analytical FEFLO (GMU-Lohner) General purpose CFD Teamcenter Simultaneous active and background encoding Video encoding and video processing Yes extraction cosmological structure formation engine (Volera) Real-time options analytics engine software for Product lifecycle management solutions rendering isovideo Viarte Video standards conversion CUDA- Yes Accelerated magneto hydrodynamics Yes compressible and incompressible flows from design to simulation to production to Yes accelerated video procession and ffA SEA3D Pro Seismic Interpretation Attributes solvers Murex MACS Analytics Implicit and explicit solver Yes service Digital Vision Nucoda Color grading Real-time color encoding calculations, geobodies Yes Library SD++ (StanfordJameson) Design software, NX, and PLM viewer correction Single only Yes extraction GTC Simulates microturbulence and transport in Analytics library for modeling valuation and General purpose CFD software for applications, TcVis and Active Workspace Marquise Technologies MainConcept CUDA Yes magnetically confined fusion plasma risk for derivatives across multiple asset compressible flows. Client. Rain H.264/AVC Encoder SDK GeoStar Seismic Suite Seismic Processing Multiple Electron push and shift (accounting for classes. Explicit solver Yes Single only Color grading CUDA-based real-time color correction H.264 video encoder Video encoding and video algorithms (RTM, etc) Yes >80% of run time) Market standard models for all asset S3D (Sandia and Oak POPULAR GPU-ACCELERATED APPLICATIONS Single only processing Yes Headwave Suite Seismic Interpretation Attributes Yes classes paired with the most efficient Ridge NL) CATALOG | MAR14 | 09 Quantel Pablo Rio Color grading and finishing Real Sorenson Squeeze Video transcoding application andcalculations, Volume Rendering Yes GTS Simulates microturbulence and the motion resolution methods (Monte Carlo Direct numerical solver (DNS) for turbulent ELECTRONIC DESIGN AUTOMATION time color correction Yes plug-In Video encoding and video processing Yes HUE HUEspace Seismic Interpretation Interpretation of charged particles and interactions in simulations and Partial Differential combustion APPLICATION DESCRIPTION SUPPORTED Red Digital Cinema Snell Alchemist on development platform Yes fusion plasma Equations) Chemistry model Yes FEATURES MULTI-GPU SUPPORT REDCINE-X Demand OpenGeo Solutions Push and shift for both electron and ion Yes DualSPHysics SPH-based CFD software SPH model Agilent Technologies ADS Simulation tool for design Color grading CUDA-accelerated debayering Single Video standards conversion Video procession and OpenSeis dynamics Numerical Algorithms Yes of RF, microwave only encoding Yes Seismic Processing Spectral Decomposition Yes Yes Group (NAG) NASA FUN3D General purpose CFD software Linear and high speed digital circuits SGO Mistika Color grading and finishing Real-time Telestream Vantage Video transcoding and Panorama Tech Seismic Processing, Modeling MILC Lattice Quantum Chromodynamics (LQCD) Random number generators, Brownian equation solver Single only Signal integrity simulation Single only color correction and finishing Single only processing Video encoding and video processing Yes Multiple algorithms (RTM, etc) Yes codes simulate how elemental particles bridges, and PDE solvers 08 | POPULAR GPU-ACCELERATED Agilent Technologies The Pixel Farm PFClean Image restoration and ON-AIR GRAPHICS Paradigm Echos RTM Seismic Processing RTM are formed and bound by the “strong force” Monte Carlo and PDE solvers Single only APPLICATIONS CATALOG | MAR14 EMPro remastering CUDA-based image processing APPLICATION DESCRIPTION SUPPORTED algorithm Yes to create larger particles like protons and RMS Catastrophic risk modeling for FSI COMPUTATIONAL STRUCTURAL MECHANICS Modeling and simulation environment for acceleration FEATURES MULTI-GPU SUPPORT Paradigm SKUA Reservoir Modeling Faults, Horizons neutrons (earthquakes, hurricanes, terrorism, APPLICATION DESCRIPTION SUPPORTED analyzing 3D EM effects of high speed and Single only Avid On-air Graphics Motion Graphics Real-time and Flow Simulation Grid Yes Staggered fermions, Krylov solvers, infectuous diseases) FEATURES MULTI-GPU SUPPORT RF/Microwave components COMPOSITING, FINISHING AND EFFECTS graphics rendering Single only Paradigm Geophysical Gauge-link fattening Risk analytics Yes Abaqus/Standard Simulation and analysis tool for FDTD solver Yes APPLICATION DESCRIPTION SUPPORTED Brainstorm Estudio Virtual Sets and motion graphics VoxelGeo Yes Tanay ZX Lib (Fuzzy structural ANSYS Nexxim Circuit simulation engine for FEATURES MULTI-GPU SUPPORT Real-time graphics rendering Single only Seismic Interpretation Volume Rendering, Horizon PIConGPU A relativistic Particle-in-Cell code that Logic) mechanics RF/analog/ ABSoft Neat Video Video noise reduction plug-in Chyron HyperX On-air graphics Real-time graphics Flattening Yes describes the dynamics of a plasma by Financial analytics and data mining library Monte Direct sparse solver Yes mixed-signal IC design; IBIS-AMI analysis CUDA-based acceleration Single only rendering Single only Roxar RMS Reservoir Modeling Multi GPU computing the motion of electrons and ions Carlo simulations, pricing of vanilla ANSYS Mechanical Simulation and analysis tool for speedup with GPU computing. Adobe After Effects CC Motion graphics and effects Chyron LEX On-air graphics Real-time graphics capabilities via HUEspace Yes subject to the Maxwell-Vlasov equation. and exotic options, fixed income analytics, structural AMI analysis Single only 3D raytracing engine based on NVIDIA rendering Single only Seismic City Prestack Simulation of laser-wakefield acceleration data mining mechanics ANSYS HFSS Simulation tool for modeling 3-D full- OptiX Harris Inscriber On-air graphics Real-time graphics Interpretation of electrons Yes Direct and iterative solvers Yes wave Yes rendering Single only Seismic Processing Multiple algorithms (RTM, etc) Yes SciComp, Inc Derivative pricing (SciFinance) Monte Impetus Afea Predicts large deformations of electromagnetic fields in high-frequency Autodesk Flame Monarch dScript 3D Virtual sets Real-time graphics Yes QUDA Library for Lattice QCD calculations using Carlo and PDE pricing models Single only structures and high-speed electronic components. Premium rendering Single only SpectraSeis Seismic Processing Full elastic waveGPUs POPULAR GPU-ACCELERATED APPLICATIONS and components exposed to extreme Transient solver Yes Finishing and color grading Post-production Monarch Twister Pro Virtual Sets and motion equation imaging and CUDA supports the following fermion CATALOG | MAR14 | 07 loading conditions CST Microwave Studio integrated end-to-end graphics Real-time graphics rendering Single only analysis of microseismic fracture data formulations: Wilson,Wilson-clover,Twisted Xcelerit SDK Software Development Kit (SDK) to Linear equation solver Yes (MWS) toolset for 3D VFX, editorial, and color Monarch Virtuoso Virtual Sets and motion graphics Yes mass,Improved staggered (asqtad or HISQ) boost LS-DYNA Implicit Multiphysics simulation package 3D EM modeling and simulation Transient solver grading Real-time graphics rendering Single only Stoneridge Technologies and Domain wall the performance of Financial applications used Linear equation solver Yes Yes Single Only Pixel Power Clarity On-air graphics Real-time GAMPACK Yes (e.g. Monte-Carlo, Finite-difference) with MSC Nastran Simulation and analysis tool for Delcross Savant Simulation tool for installed antenna Autodesk Smoke Finishing and editing Faster effects graphics rendering Single only Reservoir Simulation GPU Algebraic MultiGrid RAMSES Simulates astrophysical problems on minimum changes to existing code structural performance and antenna-to-antenna Single only RT Software tOG On-air graphics Real-time graphics Package Yes different scales (e.g. star formation, C++ programming language, crossplatform (back-endmechanics coupling Boris FX Continuum rendering Single only Terraspark Insight Earth Seismic Interpretation galaxy dynamics, cosmological structure generates CUDA and Direct sparse solver Yes High-frequency solver Yes Complete Vizrt Viz Engine On-air graphics Real-time graphics Horizon orientation attributes; automated formation) optimized CPU code), supports Windows MSC Marc Simulation and analysis tool for structural EMSS FEKO 3D EM modeling and simulation MoM Visual effects plug-in Faster effects Single only rendering Single only fault extraction, Curvature Attributes CUDA acceleration is applied for radiative and Linux operating systems. mechanics solver Single only Cinnafilm Dark Energy Wasp 3D Beehive On-air graphics and virtual sets Yes transfer for reionization, and the Yes Direct sparse solver Yes Remcom XFdtd 3D EM modeling and simulation Plug-in Real-time graphics rendering Single only Tsunami RTM Seismic Processing RTM algorithm hydrodynamic solver using AMR Synerscope’s Synerscope OptiStruct Simulation and analysis tool for structural FDTD solver Yes Color management All features accelerated with 12 | POPULAR GPU-ACCELERATED Yes Yes Data Visualization mechanics SPEAG SEMCAD-X 3D EM modeling and simulation CUDA Yes APPLICATIONS CATALOG | MAR14 WesterGeco Omega2 Chemora Chemora is a system for performing Visual big data exploration and insight tools Graphical Direct and iterative solvers Yes FDTD solver Yes CoreMelt complete Visual effects plug-in Faster ON-SET, REVIEW AND STEREO TOOLS RTM simulations of systems described exploration of large network DS Exsight Uses Abaqus Standard for GPU Rocketick RocketSim Verilog simulation Verilog effects Single only APPLICATION DESCRIPTION SUPPORTED Seismic Processing Multiple algorithms (RTM, etc) by differential equations running on datasets including geo-spatial and computing Direct sparse solver Single only simulation Yes eyeon Fusion Effects and compositing Faster effects FEATURES MULTI-GPU SUPPORT Yes accelerated computational clusters temporal components DS DesignSight Uses Abaqus Standard for GPU Media and Entertainment Single only 3ality Intellicam 3D stereo camera adjustment CUDA-For more information on GPU-accelerated Chemora embeds the equations’ Single only computing Direct sparse solver Single only ANIMATION, MODELING AND RENDERING GenArts Monsters GT Visual effects plug-in Faster based imaging Single only applications please visit,www.nvidia.com/teslaapps computational kernels into dynamically QuantAlea’s Alea.cuBase PAM-CRASH Implicit Multiphysics simulation APPLICATION DESCRIPTION SUPPORTED effects Single only BlueFish 4K Review Review and approval of 4K © 2014 NVIDIA Corporation. All rights reserved. compiled loop nests shaped for input size F# package used Linear equation solver Single only FEATURES MULTI-GPU SUPPORT GenArts Sapphire Visual effects plug-in Faster content Real-time Single only NVIDIA, the NVIDIA logo, and CUDA, are and GPU structure F# package enabling a growing set of F# NX Nastran Simulation and analysis tool for structural Autodesk 3ds Max + effects Single only Colorfront On-Set Dailies Review and approval of 4K trademarks and/or registered trademarks of Yes capability to run on a GPU. mechanics NVIDIA iray ROBUSKEY Chroma keyer plug-in Faster effects content Real-time Yes NVIDIA Corporation. All company and product names 06 | POPULAR GPU-ACCELERATED F# for GPU accelerators Yes Linear equation solver Single only 3D modeling, animation, and rendering iray Single only eyeon Dimension 3D stereoscopic workflow Realare trademarks or registered trademarks of the APPLICATIONS CATALOG | MAR14 Altimesh’s Hybridizer C# Multi-target C# framework COMPUTER AIDED DESIGN interactive, photorealistic and Neat Video Open FX Video noise reduction plug-in time Single only respective owners with which they Defense and Intelligence for data parallel APPLICATION DESCRIPTION SUPPORTED physically correct rendering Faster effects Single only Lightcraft Previzion On-set workflow Real-time Single are associated. Features, pricing, availability, and APPLICATION DESCRIPTION SUPPORTED computing. FEATURES MULTI-GPU SUPPORT Yes NewBlueFX Video only specifications are all subject to change without FEATURES MULTI-GPU SUPPORT C# with translation to GPU or Multi-Core AutoCAD 2D and 3D CAD design, drafting, modeling, Autodesk Maya 3D modeling, animation, and Essentials Tweak RV Review and approval of 4K content Real- notice. MAR14 DigitalGlobe Advanced Xeon architectural drawing, and engineering rendering Increased model complexity, larger scenes,Video effects plug-in Faster effects Single only time Single only Ortho Series Yes software. Supports Open GL. Native DWG™ windowed stereo NewBlue Titler Pro Video titling plug-in Faster effects SIMULATION Geospatial Visualization Image orthorectification Yes MISYS Global Risk Regulatory compliance and support. Single only Single only APPLICATION DESCRIPTION SUPPORTED Eternix Blaze Terra Geospatial Visualization 3D enterprise wide Surface, mesh, and solid modeling tools, Autodesk Motion Builder Character animation and Pixelan AnyFX Video effects plug-in Faster effects FEATURES MULTI-GPU SUPPORT visualization of geospatial data Yes risk transparency package model documentation tools, parametric motion capture Increased model complexity at Single only 3DAliens Glu3d SPH fluid simulation Faster Exelis (ITT) ENVI Geospatial Visualization Image Risk Analytics Yes drawing capabilities. Native DWG™ interactive Re:Vision Effects Visual effects plug-in Faster effects simulation Single only orthorectification MiAccLib High Speed Multi-Algorithm Search Engine support. rates Single only Blastcode Kilton/ (custom builds only) library providing high speed text string Single only Single only Red Giant Effects Suite Visual effects plug-ins FasterMegaton Yes search with scalability of searching text AutoCAD Design Suite AutoCAD 2014 software, plus Autodesk Mudbox 3D sculpting Increased model effects Single only Physics-based simulation plug in Faster simulation MrGeo Geospatial Visualization Terrain analytics Yes and/or keywords on hundreds millions of tools to create, complexity at interactive Red Giant Magic Bullet Single only GeoWeb3d Desktop Geospatial Visualization 3D records and/or text data. capture, connect, and showcase designs rates Looks Jawset TurbulenceFD Physics-based simulation plug visualization of geospatial data Yes Exact Text match Search, Approximate\ 2D/3D display of designs, interactive 3D Single only Color and finishing tools Faster effects Single only in Faster simulation Single only, Incogna GIS Geospatial Visualization Image Similarity Text Search, Wild Card presentation with realistic materials, Cebas finalRender GPU Renderer CUDA-based GPUThe Foundry HIERO Shot management, conform multi-GPU processing on Tesla cloud servers Text Search, Proximity & Percentage rendering-ray tracing Rendering Yes and review available in Object recognition Text Search, MultiKeyword and Single only CentiLeo GPU Render GPU Renderer CUDA-based timeline late Q2/Q3 Yes MultiColumnMultiKeyword Text Search, Autodesk 3ds Max 3D animation creative toolset for GPU Rendering Single only Better interactivity Single only WEATHER AND CLIMATE FORECASTING Intergraph Motion Video RadixSort Text modeling, Chaos V-Ray RT GPU Renderer CUDA interactive The Foundry NUKE and APPLICATION DESCRIPTION SUPPORTED Analyst Yes animation, simulation, and rendering for and final-frame GPU NUKEX FEATURES MULTI-GPU SUPPORT Video filters and mosaic’ing - Geo-fuses SunGard Adaptiv games, film, and motion graphics Rendering Compositing tools with 3D tracker Faster effects Accuweather Cinemative FMV analytics with intelligence data Analytics 3D modeling, mesh and surface modeling, Yes Single only HD Full motion video ortho mosaic processing Single A flexible and extensible engine for fast improved Nitrous viewport performance, Jawset TurbulenceFD Physics-based simulation plug-Video Copilot Software Weather graphics Real-time Single only 6 Software Development Ecosystem for Intel XeonOpen PhiSource Commercial Compiler gcc (kernel build only, not for applications), Python Intel Parallel Studio XE (C++ & Fortran) Intel Cluster Studio XE (C++ & Fortran) CAPS HMPP compiler (beta) Debugger gdb Intel Debugger Rogue Wave TotalView (beta) Allinea DDT Libraries TBB (in Intel Studio XE) MPICH2 FFTW NetCDF Intel MKL, Intel MPI, OpenMP, Intel IPP, Cilk™ Plus (in Intel Studio XE products), NAG Rogue Wave IMSL Profiling & Analysis Tools Intel VTune Amplifier XE Intel Trace Analyzer & Collector Intel Inspector XE Workload Scheduler Altair PBS Professional, Adaptive Computing Moab System Management SGI Management Center Bright Cluster Manager (beta) For more information: http://software.intel.com/en-us/articles/intel-and-third-party-tools-and-libraries-available-with-support-for-intelr-xeon-phitm 7 Next generation XEON: Haswell 8 Next-next generation HPC XEON PHI 9 “Pyramid 1U” Model C1104G-RP5 Chassis Profile 1U standard-depth Servers/System One dual-socket Chipset Intel® C600 Max. Processors Two Intel® Xeon® processor E5-2600 series Max. CPU TDP 115W Memory Slots 8 DIMM slots Memory Type 1600/1333/1066/800 MHz DDR3 ECC Reg Max. Hard Disk Drives 4 x 2.5" drives Expansion Slot Three PCIe 3.0 x16 double-width slots One external PCIe 3.0 x8 low-profile slot Networking (Onboard) Dual-Port GigE controller (Intel® I350) NVIDIA GPU accelerator support: • 3x K10, K20, K20X NVIDIA graphics GPU support: • Quadro Plex 7000 Intel accelerator support: • 3x Phi 10 IPMI Remote Management Integrated IPMI 2.0 Power Supply 1800W Redundant* Platinum Level 10 “Pyramid 2U” Model C2110G-RP5 Chassis Profile Servers/System 2U standard-depth One dual-socket Chipset Intel® C600 Max. Processors Two Intel® Xeon® processor E5-2600 series Max. CPU TDP Memory Slots 130W 8 DIMM slots Memory Type 1600/1333/1066/800 MHz DDR3 ECC Reg Max. Hard Disk Drives 10 x 2.5” drives Expansion Slot NVIDIA GPU accelerator support: • 4x K10, K20, K20X NVIDIA graphics GPU support: • Quadro Plex 7000 Intel accelerator support: • 4x Phi Four internal PCIe 3.0 x16 double-width (optional riser card for 2 additional external PCIe 3.0 x16 double width slots), One external PCIe 3.0 x8 low profile and one PCIe 2.0 x4 full height Networking (Onboard) Dual-Port GigE controller (Intel® I350) IPMI Remote Management Integrated IPMI 2.0 Power Supply 1800W Redundant** Platinum Level 11 11 SGI UV 2000 Within a Single Standard 19” Rack Up to: • 64 socket/512 cores/1024 threads ® ® - Or 34 CPU + 30 Intel Xeon Phi™ - Or 36 CPU + 28 NVIDIA® Tesla® K20,K20X, or K40 (2 partitions) • 16TB memory • 63 x16 PCIe Gen 3 Links 12 12 SGI UV and Scale-out Compared Feature Standard Scale-out Servers SGI UV Architecture Reference Terms Scale Out Cluster Distributed Memory Scale Up Single System Image Shared Memory System Limit 16 cores, 0.5 TB memory 2048 cores, 64 TB memory CPU x86 Intel® Xeon® or AMD Opteron™ X86 Intel® Xeon® Memory, Storage, networking Industry standard Industry Standard Interconnect or System Fabric Ethernet or InfiniBand SGI NUMAlink Hardware Package or Building Block Blades or Rackmount, 19” rack Blades (UV2000) or Rackmount(UV20), 19” rack Software Off the Shelf Off the Shelf Applications Small scale single apps cluster or “MPI” apps Small or large single apps Cluster or “MPI” apps also Cost Lowest cost x86 architecture 20-75% > scale-out, ~1/3 the cost of ‘Big Iron’. 13 13 SGI® Value Propositions • Long history of working with accelerators • • • “Home-brewed” – Geometry Engine, TPU (Tensor Processor Unit) FPGA’s (RASC™ technology), GPUs, SOCs 50 application experts focused on it • Accelerators in both a scale-up and scale-out environment • • 32 Intel® Xeon® Phi™ coprocessors in SGI® UV™ 2000 (COSMOS and TGAC) SGI: the only vendor able to deliver hybrid scale-out and scale-up solutions • Everything you need in a powerful solution • • • Factory-integrated, tested, rack level delivery – plug in and go Starter kits Worldwide customer support • Fully-managed, with SGI Management Center and Performance Suite software 14 14 SGI® Integrated Clusters for Intel Xeon Phi A complete, managed GPU solution of software and hardware, all you need for a powerful deployment • Hardware stack • Rackable™ C2108 head node • Either/or Rackable C1104G or C2110G compute nodes • Infiniband NICs and Switch • Ethernet switch for management • Add SGI InfiniteStorage solutions, either direct-attached to the head node or SGI NAS Software stack • SGI Management Center software • Altair PBSpro Load Management software Complete Factory Test & Integration 15 SGI Management Center ® Premium Edition offers Accelerator Monitoring and Management! • Single system management console with remote server monitoring and control • Ease of use, full system management including GUI and CLI • Policy driven, fine grained power load management for green computing • Advanced fault, event, and alert management for improved reliability • Advanced capabilities including Accelerator management, BIOS management, and high availability 16 SMC Premium Edition– Accelerator Monitoring Detailed GPU reporting enables high performance systems to remain at their peak performance 17 Optimized for HPC & Big Data Solutions • SGI ICE X • SGI UV™ 2000 ® ™ 18 SGI ICE X Compute Node with GPU • Two Intel® Xeon® processor E5-2600 series • Two NVIDIA® TESLA® K20 • FDR InfiniBand - single or dual plane • Four DDR3 DIMMs per socket @ 1600 MT/s • Up to one 2.5” SATA HDD/ SSD drives • Liquid cold sinks One dual socket node w/ two coprocessors in one blade slot! FDR + 1:1 processor to coprocessor ratio = Balanced Throughput 19 19 SGI ICE X Compute Node with Phi • Two Intel® Xeon® processor E5-2600 series • Two Intel® Xeon Phi™ 5120D Coprocessor • FDR InfiniBand - single or dual plane • Four DDR3 DIMMs/ socket @ up to 1600 MT/s • Expect 1866 MT/s support w/1 DPC populated (w/ Ivy Bridge-EP) • Up to one 2.5” SATA HDD/ SSD drive • Liquid cold sinks One dual socket node w/ two coprocessors in one blade slot! FDR + 1:1 processor to coprocessor ratio = Balanced Throughput Balanced 20 20 throughput SGI Cold Sink Technology – Twin Blades 21 SGI Meets All Accelerator Needs! • SGI has the longest history with accelerators • Accelerators have gone mainstream, we are supporting them across our product line • The only vendor to be able to deploy scaleup, scale-out and hybrid landscapes with accelerators everywhere. 22 22 SGI ICE: Sample ICE Customers BAW - Bundesanstalt für Wasserbau Bridgestone Central Research Inst of Electric Pwr Ind CSIR - Centre for Mathematical Modeling and Compute Exeter University GENCI HLRN Berlin HLRN Hannover ICHEC ICR Idaho National Laboratory IFREMER Imperial College IMSc INMET JAMSTEC Korean Air Force KU Leuven Mazda Motor Corporation McLaren Motor Racing Ltd Mercedes-Benz GP MPO NASA Ames NASA Langley NIIFI NIMS NMCAC NOAA (CSC) NTNU Onera ORNL Pontificia Universidade Catolica – PUC-Rio Rosshydromet Scientific Analysis Group Semiconductor Energy Labs 23 Sikorsky Skoda Auto TI-09 Tokyo University – ISSP Total Toyota U.S. Air Force - Arnold AFB U.S. Navy/NRL Universidade Catolica Universite Paul Sabatier – CICT/Calmip University of Arizona University of Hyderabad University of Oxford University of São Paulo – USP/IAG University of Rio de Janeiro – NACAD University of Rio Grande do Sul – CESUP US Army TACOM Hungary 2011-2014 NIIFI supercomputers for non-profit customers: • Cluster: SGI ICE Debrecen (18TFlops, 1536 core; 6TB ram) • Cluster + GPU: HP Szeged (14TFlops, 2304 core; 5,6TB ram, 6xM2070) • SMP/ccNUMA: SGI UV Pécs (10TFlops, 1152 core, 6TB ram) http://www.niif.hu/szolgaltatasok/szuperszamitastechnika/altalanos_ismerteto 24 Hungary 2015 • Cluster: 200+ Tflops (200pc GPU or PHI) • Cluster: 30+ GPU or PHI • Cluster: SGI ICE Debrecen (18TFlops, 1536 core; 6TB ram) • Cluster + GPU: HP Szeged (14TFlops, 2304 core; 5,6TB ram, 6xM2070) • SMP/ccNUMA: SGI UV Pécs (10TFlops, 1152 core, 6TB ram) 25 ©2012 SGI 26 26