ENVISION. ACCELERATE. ARRIVE. Accelerating Mathematica®: Vectors for all Simon McIntosh-Smith VP of Applications, ClearSpeed Technology simon@clearspeed.com Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 1 www.clearspeed.com Agenda • • • • • Introduction Accelerators ClearSpeed math acceleration technology Accelerating Mathematica Summary Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 2 www.clearspeed.com ENVISION. ACCELERATE. ARRIVE. Introduction Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 3 www.clearspeed.com Introduction • Mathematica® is being used to solve more and more computationally intensive problems • General purpose CPUs keep getting faster, but a new wave of application accelerators are emerging that could give much greater performance – Much as GPUs have done for graphics • ClearSpeed has been developing hardware accelerators specifically focused on scientific computing, and which accelerate the low-level math libraries used by Mathematica Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 4 www.clearspeed.com ENVISION. ACCELERATE. ARRIVE. Accelerators Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 5 www.clearspeed.com Accelerator technologies • Visualization and media processing – Good for graphics, video, game physics, speech, … – Graphics Processing Units (GPUs) are well established in the mainstream – But there was a time not too long ago when your PC still did all the graphics in software on the main CPU… – Can be applied to some 32-bit applications, but low accuracy (not IEEE754 floating point), are fairly hard to program, and very power hungry! • Embedded content processing – Data mining, encryption, XML, compression – Field Programmable Gate Arrays (FPGAs) are often being used here, mainly to accelerate integer-intensive codes – Poor at floating point, especially 64-bit, and cut corners on precision so don’t get good accuracy – Very hard to program and get good performance Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 6 www.clearspeed.com Accelerator technologies continued • Math Accelerators – Mostly floating point, 64-bit performance is crucial, high precision, supporting true IEEE754 floating point • (“Video game FLOPS” may be fast and cheap, but you get what you pay for, and what’s the wrong answer really worth?) – Can accelerate numerically-intensive applications in • • • • • • Finance Oil and Gas Economics Electromagnetics Bioinformatics And many, many more – This is what ClearSpeed has developed • To accelerate Mathematica, a true Math Accelerator is needed… Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 7 www.clearspeed.com The other benefit of accelerators – low power • Running 1 watt for 1 years costs about $1 • Modern CPUs can consume around 100W – $100/year running cost for the CPU alone if used 24/7 • Accelerators typically bring significant performance per watt gains – Examples later in this presentation show 1 CPU plus a 25W ClearSpeed board running as fast as a 4 CPU (8 core) machine – This power consumption reduction of around 275W, if applied 24/7, is a $275 energy cost saving – Not to mention how much smaller and quieter the accelerated system can be… Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 8 www.clearspeed.com ENVISION. ACCELERATE. ARRIVE. ClearSpeed’s Math Acceleration Technology Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 9 www.clearspeed.com What are ClearSpeed’s products? • Math accelerator board, The ClearSpeed Advance™ – Dual ClearSpeed CSX600 coprocessors – R∞ ≈ 50 GFLOPS for 64-bit matrix multiply (DGEMM) calls • Hardware also supports 32-bit floating point – 133 MHz PCI-X 2/3rds length (8”) form factor – 1GByte of memory on the board – Linux drivers today for RedHat and Suse • Windows coming in a future release – Low power; around 25 Watts • Significantly accelerate the low-level math library used by Mathematica (MKL): – Target functions: Level 3 BLAS, LAPACK, FFTs Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 10 www.clearspeed.com Which MKL functions can ClearSpeed accelerate? L3 BLAS: • DGEMM • ZGEMM (upcoming release) • Under development – DTRSM and others LAPACK functions for: • LU (DGETRF) • QR (upcoming release) • Cholesky (upcoming release) • Under development – Eigenvalues, SVD, … For FFTs: • Acceleration for large 2D, 3D to be added in the future • Better yet are compound FFT-based functions, such as convolution For trig and other functions: • Exploring long vectors of sin, cos, exp, log, sqrt et al Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 11 www.clearspeed.com Software development kit (SDK) • C compiler with vector extensions (ANSI-C based commercial compiler), assembler, libraries, ddd/gdbbased debugger, newlib-based C-rtl etc. • ClearSpeed Advance development boards • Available for Linux, Windows Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 12 www.clearspeed.com ENVISION. ACCELERATE. ARRIVE. Accelerating Mathematica Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 13 www.clearspeed.com Mathematica uses libraries underneath Mathematica BLAS & LAPACK library: Intel’s MKL Software Hardware CPU Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 14 www.clearspeed.com Mathematica using accelerated libraries Mathematica BLAS & LAPACK library: Intel’s MKL CPU Copyright © 2006 ClearSpeed Technology plc. All rights reserved. ClearSpeed’s CSXL Library ClearSpeed AdvanceTM board Wolfram Technology Conference 12 October 2006 Software Hardware 15 www.clearspeed.com Plug-and-Play – No changes to your notebooks • Mathematica defaults to using MKL since v5.2 • ClearSpeed provides a modified kernel – Uses a modified “math” script that launches the kernel – Sets the library path to pick up CSXL as well as MKL • Functions supported in Mathematica today: – – – – – Dot[] Det[] LUDecomposition[] LinearSolve[] Inverse[] • If your notebooks spend a high percentage of your total runtime in these functions, and a lot of time in each call to these functions, then you may have a candidate for ClearSpeed acceleration! Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 16 www.clearspeed.com What kind of notebooks could be accelerated? • ClearSpeed has been collaborating with ScienceOps to discover what kinds of problems are accelerated – Also see ScienceOps’ own talk here! • Early results show a good breadth of applications being accelerated – Performance improvements – Ability to run larger problem sets • Initial results show speedup ranging from 2 – 5X Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 17 www.clearspeed.com Example notebooks • Benchmarked on a fast server for comparison: – 4 processors, each dual core (8 cores total), AMD Opteron 870 (2GHz) with 32GBytes of memory running Linux RHE4-64 • Comparisons are between: – Using 2 Opteron cores on their own – Using all 8 Opteron cores on their own, and – Using 2 Opteron cores with a single ClearSpeed Advance accelerator board • The notebooks are very new and we believe there is more performance to come from the accelerated versions with a bit more tuning Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 www.clearspeed.com 18 Example notebook descriptions • ANOVA – Analysis of variance, a linear least squares minimisation, fitting a curve to sampled data • Microarray – Microarray data analysis, determines coexpression networks – sets of genes that are commonly expressed together under different experimental conditions. Calculates a distance – distance metric • ImageDecode – Progressive decoding of images using the Haar wavelet transform • Spatial Auto Regression (SAR) – Simple regressions iterating on large, dense matrices Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 19 www.clearspeed.com Example – ANOVA ANOVA speedup 100 90 Time in seconds (lower is better) 80 Host speed (2 cores) Host speed (8 cores) 70 2 cores + accelerator 60 50 40 30 20 10 0 500 1000 2000 4000 Number of Predictors • ANOVA notebook benefits from 2X speedup with 4,000 predictors • Two cores with a ClearSpeed accelerator equivalent in performance to an eight core machine! Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 20 www.clearspeed.com Example – Microarray Microarray speedup 35 30 Time in seconds (lower is better) Host speed (2 cores) 25 Host speed (8 cores) 2 cores + accelerator 20 15 10 5 0 800^2 1000^2 2000^2 4000^2 Yeast size • Microarray notebook benefits from nearly a 3X speedup with 4,000 inputs • Larger problems may receive even more speedup – Data sets with over 6,000 expression levels exist for yeast Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 21 www.clearspeed.com Example – ImageDecode 140 ImageDecode speedup 120 Time in seconds (lower is better) Host speed (2 cores) 100 Host speed (8 cores) 2 cores + accelerator 80 60 40 20 0 1024x1024 1600x1200 3072x2304 Image size • ImageDecode notebook speedup ranges from 2-3X depending on the image size • When tuned this speedup should also be achieved for images around 960x960 in size (already around 1.6X) Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 22 www.clearspeed.com Example – Spatial Auto Regression 1200 SAR speedup Host speed (2 cores) Time in seconds (lower is better) 1000 Host speed (8 cores) 2 cores + accelerator 800 600 400 200 0 50 (0.5GBytes) Problem size • SAR notebook speedup nearly 2X • Larger problems should receive even more speedup – Run-times quite substantial too Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 23 www.clearspeed.com ENVISION. ACCELERATE. ARRIVE. Summary Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 24 www.clearspeed.com Summary • Accelerators can be used to significantly increase performance and performance per watt across a range of interesting applications in Mathematica • You need a real 64-bit math accelerator for Mathematica to deliver the precision you depend upon • ClearSpeed can accelerate notebooks making intensive use of Dot[], Det[], LUDecomposition[], LinearSolve[] and Inverse[] – More in the future as the libraries are developed • Plug-and-play – no changes to your notebooks • How fast can you go? Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 25 www.clearspeed.com Recent news! • ClearSpeed’s accelerators don’t just accelerate your workstation or server, they can be used to build supercomputers too! • Announced this Monday: Tokyo Tech have accelerated their Linux supercomputer, TSUBAME, from 38 TFLOPS to 47 TFLOPS with 360 ClearSpeed Advance boards – An increase in performance of 24%, but for just a 1% increase in power consumption Professor Matsuoka standing beside TSUBAME at Tokyo Tech Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 26 www.clearspeed.com A special offer at WTC06! • If you want to put supercomputer technology in your own machine, ClearSpeed has a special offer at WTC: – 37.5% discount available to the first twenty Wolfram Technology Conference attendees purchasing a ClearSpeed Advance accelerator board under the terms of this limited offer – $4,995 plus local sales taxes • Talk to a ClearSpeed representative at the conference to find out if your machine is compatible – Launching on x86 for Linux RHE3/4 & SLES9 Copyright © 2006 ClearSpeed Technology plc. All rights reserved. Wolfram Technology Conference 12 October 2006 27 www.clearspeed.com