ENVISION. ACCELERATE.
ARRIVE.
Accelerating Mathematica®:
Vectors for all
Simon McIntosh-Smith
VP of Applications, ClearSpeed Technology
simon@clearspeed.com
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
1
www.clearspeed.com
Agenda
•
•
•
•
•
Introduction
Accelerators
ClearSpeed math acceleration technology
Accelerating Mathematica
Summary
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
2
www.clearspeed.com
ENVISION. ACCELERATE.
ARRIVE.
Introduction
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
3
www.clearspeed.com
Introduction
• Mathematica® is being used to solve more and
more computationally intensive problems
• General purpose CPUs keep getting faster, but
a new wave of application accelerators are
emerging that could give much greater
performance
– Much as GPUs have done for graphics
• ClearSpeed has been developing hardware
accelerators specifically focused on scientific
computing, and which accelerate the low-level
math libraries used by Mathematica
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
4
www.clearspeed.com
ENVISION. ACCELERATE.
ARRIVE.
Accelerators
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
5
www.clearspeed.com
Accelerator technologies
• Visualization and media processing
– Good for graphics, video, game physics, speech, …
– Graphics Processing Units (GPUs) are well established in the
mainstream
– But there was a time not too long ago when your PC still did all
the graphics in software on the main CPU…
– Can be applied to some 32-bit applications, but low accuracy
(not IEEE754 floating point), are fairly hard to program, and very
power hungry!
• Embedded content processing
– Data mining, encryption, XML, compression
– Field Programmable Gate Arrays (FPGAs) are often being used
here, mainly to accelerate integer-intensive codes
– Poor at floating point, especially 64-bit, and cut corners on
precision so don’t get good accuracy
– Very hard to program and get good performance
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
6
www.clearspeed.com
Accelerator technologies continued
• Math Accelerators
– Mostly floating point, 64-bit performance is crucial, high
precision, supporting true IEEE754 floating point
• (“Video game FLOPS” may be fast and cheap, but you get what you
pay for, and what’s the wrong answer really worth?)
– Can accelerate numerically-intensive applications in
•
•
•
•
•
•
Finance
Oil and Gas
Economics
Electromagnetics
Bioinformatics
And many, many more
– This is what ClearSpeed has developed
• To accelerate Mathematica, a true Math Accelerator is
needed…
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
7
www.clearspeed.com
The other benefit of accelerators – low power
• Running 1 watt for 1 years costs about $1
• Modern CPUs can consume around 100W
– $100/year running cost for the CPU alone if used 24/7
• Accelerators typically bring significant
performance per watt gains
– Examples later in this presentation show 1 CPU plus a
25W ClearSpeed board running as fast as a 4 CPU (8
core) machine
– This power consumption reduction of around 275W, if
applied 24/7, is a $275 energy cost saving
– Not to mention how much smaller and quieter the
accelerated system can be…
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
8
www.clearspeed.com
ENVISION. ACCELERATE.
ARRIVE.
ClearSpeed’s Math
Acceleration Technology
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
9
www.clearspeed.com
What are ClearSpeed’s products?
• Math accelerator board, The ClearSpeed Advance™
– Dual ClearSpeed CSX600 coprocessors
– R∞ ≈ 50 GFLOPS for 64-bit matrix multiply (DGEMM) calls
• Hardware also supports 32-bit floating point
– 133 MHz PCI-X 2/3rds length (8”) form factor
– 1GByte of memory on the board
– Linux drivers today for RedHat and Suse
• Windows coming in a future release
– Low power; around 25 Watts
• Significantly accelerate the low-level math library used by
Mathematica (MKL):
– Target functions: Level 3 BLAS, LAPACK, FFTs
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
10
www.clearspeed.com
Which MKL functions can ClearSpeed accelerate?
L3 BLAS:
• DGEMM
• ZGEMM (upcoming release)
• Under development – DTRSM and others
LAPACK functions for:
• LU (DGETRF)
• QR (upcoming release)
• Cholesky (upcoming release)
• Under development – Eigenvalues, SVD, …
For FFTs:
• Acceleration for large 2D, 3D to be added in the future
• Better yet are compound FFT-based functions, such as
convolution
For trig and other functions:
• Exploring long vectors of sin, cos, exp, log, sqrt et al
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
11
www.clearspeed.com
Software development kit (SDK)
• C compiler with vector extensions (ANSI-C based
commercial compiler), assembler, libraries, ddd/gdbbased debugger, newlib-based C-rtl etc.
• ClearSpeed Advance development boards
• Available for Linux, Windows
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
12
www.clearspeed.com
ENVISION. ACCELERATE.
ARRIVE.
Accelerating Mathematica
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
13
www.clearspeed.com
Mathematica uses libraries underneath
Mathematica
BLAS & LAPACK library:
Intel’s MKL
Software
Hardware
CPU
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
14
www.clearspeed.com
Mathematica using accelerated libraries
Mathematica
BLAS & LAPACK library:
Intel’s MKL
CPU
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
ClearSpeed’s
CSXL Library
ClearSpeed
AdvanceTM
board
Wolfram Technology Conference
12 October 2006
Software
Hardware
15
www.clearspeed.com
Plug-and-Play – No changes to your notebooks
• Mathematica defaults to using MKL since v5.2
• ClearSpeed provides a modified kernel
– Uses a modified “math” script that launches the kernel
– Sets the library path to pick up CSXL as well as MKL
• Functions supported in Mathematica today:
–
–
–
–
–
Dot[]
Det[]
LUDecomposition[]
LinearSolve[]
Inverse[]
• If your notebooks spend a high percentage of your total
runtime in these functions, and a lot of time in each call
to these functions, then you may have a candidate for
ClearSpeed acceleration!
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
16
www.clearspeed.com
What kind of notebooks could be accelerated?
• ClearSpeed has been collaborating with ScienceOps to
discover what kinds of problems are accelerated
– Also see ScienceOps’ own talk here!
• Early results show a good breadth of applications being
accelerated
– Performance improvements
– Ability to run larger problem sets
• Initial results show speedup ranging from 2 – 5X
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
17
www.clearspeed.com
Example notebooks
• Benchmarked on a fast server for comparison:
– 4 processors, each dual core (8 cores total), AMD Opteron
870 (2GHz) with 32GBytes of memory running Linux
RHE4-64
• Comparisons are between:
– Using 2 Opteron cores on their own
– Using all 8 Opteron cores on their own, and
– Using 2 Opteron cores with a single ClearSpeed Advance
accelerator board
• The notebooks are very new and we believe
there is more performance to come from the
accelerated versions with a bit more tuning
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
www.clearspeed.com
18
Example notebook descriptions
• ANOVA
– Analysis of variance, a linear least squares minimisation,
fitting a curve to sampled data
• Microarray
– Microarray data analysis, determines coexpression
networks – sets of genes that are commonly expressed
together under different experimental conditions.
Calculates a distance – distance metric
• ImageDecode
– Progressive decoding of images using the Haar wavelet
transform
• Spatial Auto Regression (SAR)
– Simple regressions iterating on large, dense matrices
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
19
www.clearspeed.com
Example – ANOVA
ANOVA speedup
100
90
Time in seconds
(lower is better)
80
Host speed (2 cores)
Host speed (8 cores)
70
2 cores + accelerator
60
50
40
30
20
10
0
500
1000
2000
4000
Number of Predictors
• ANOVA notebook benefits from 2X speedup with 4,000
predictors
• Two cores with a ClearSpeed accelerator equivalent in
performance to an eight core machine!
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
20
www.clearspeed.com
Example – Microarray
Microarray speedup
35
30
Time in seconds
(lower is better)
Host speed (2 cores)
25
Host speed (8 cores)
2 cores + accelerator
20
15
10
5
0
800^2
1000^2
2000^2
4000^2
Yeast size
• Microarray notebook benefits from nearly a 3X speedup with
4,000 inputs
• Larger problems may receive even more speedup
– Data sets with over 6,000 expression levels exist for yeast
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
21
www.clearspeed.com
Example – ImageDecode
140
ImageDecode speedup
120
Time in seconds
(lower is better)
Host speed (2 cores)
100
Host speed (8 cores)
2 cores + accelerator
80
60
40
20
0
1024x1024
1600x1200
3072x2304
Image size
• ImageDecode notebook speedup ranges from 2-3X
depending on the image size
• When tuned this speedup should also be achieved for
images around 960x960 in size (already around 1.6X)
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
22
www.clearspeed.com
Example – Spatial Auto Regression
1200
SAR speedup
Host speed (2 cores)
Time in seconds
(lower is better)
1000
Host speed (8 cores)
2 cores + accelerator
800
600
400
200
0
50 (0.5GBytes)
Problem size
• SAR notebook speedup nearly 2X
• Larger problems should receive even more speedup
– Run-times quite substantial too
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
23
www.clearspeed.com
ENVISION. ACCELERATE.
ARRIVE.
Summary
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
24
www.clearspeed.com
Summary
• Accelerators can be used to significantly increase
performance and performance per watt across a range of
interesting applications in Mathematica
• You need a real 64-bit math accelerator for Mathematica
to deliver the precision you depend upon
• ClearSpeed can accelerate notebooks making intensive
use of Dot[], Det[], LUDecomposition[], LinearSolve[] and
Inverse[]
– More in the future as the libraries are developed
• Plug-and-play – no changes to your notebooks
• How fast can you go?
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
25
www.clearspeed.com
Recent news!
• ClearSpeed’s accelerators don’t just accelerate your
workstation or server, they can be used to build
supercomputers too!
• Announced this Monday: Tokyo Tech have accelerated
their Linux supercomputer, TSUBAME, from 38 TFLOPS
to 47 TFLOPS with 360 ClearSpeed Advance boards
– An increase in performance of 24%, but for just a 1% increase in
power consumption
Professor Matsuoka standing
beside TSUBAME at Tokyo Tech
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
26
www.clearspeed.com
A special offer at WTC06!
• If you want to put supercomputer technology in
your own machine, ClearSpeed has a special
offer at WTC:
– 37.5% discount available to the first twenty Wolfram
Technology Conference attendees purchasing a
ClearSpeed Advance accelerator board under the terms of
this limited offer
– $4,995 plus local sales taxes
• Talk to a ClearSpeed representative at the
conference to find out if your machine is
compatible
– Launching on x86 for Linux RHE3/4 & SLES9
Copyright © 2006 ClearSpeed Technology plc. All rights reserved.
Wolfram Technology Conference
12 October 2006
27
www.clearspeed.com