Scientific Computing Beyond Matlab

advertisement
Scientific Computing Beyond
Matlab
Nov 19, 2012
Jason Su
Motivation
• I’m interested in (re-)coding a general solver for
sc/mcDESPOT relaxometry mapping
– Open source
– Extensibility to new/add’l sequences with better sensitivity
to certain parameters, e.g. B0 and MWF
– Better parallelization
• But:
– Large-scale code development in Matlab is cumbersome
– Matlab is slow
– C is hard (to write, read, debug)
• Creates large barrier for others to contribute
Matlab
Pros
• Ubiquitous, code is crossplatform
• Can be fast with vectorized
code
• Data visualization
• Quick development time
• Great IDE for general research
– Poor for large projects
• Many useful native
libraries/toolboxes
• Built-in profiling tools
Cons
• Requires license, not free
(though there is Octave)
• Vectorized code is often
non-intuitive to write and
hard to read
• Slow for general
computations
• Limited parallel computing
and GPU support
C/C++
Pros
• Fast
• Great IDEs for large coding
projects
– Not as great for general
science work
• Strong parallel computer
support and CUDA
• Community libraries for
scientific computing
• Profiling dependent on IDE
Cons
• High learning curve and
development time
• No data visualization
• Compiled code is platform
specific
• Compiler is not generally
installed with OSX and
Windows
Python
Pros
• Preinstalled with OSX and
Linux-based systems
• Readability is a core tenet
(“pythonic”)
• Quick development time
• Native parallel computing
support and community GPU
modules
• Extensive community support
– Including neuroimagingspecific: NiPype, NiBabel
• Built-in profiling module and
some IDE tools
Cons
• Slow for general
computation
• Mixed bag of IDEs, some are
great for coding, others for
research
• Out of the box it’s a poor
alternative: no linear
algebra or data visualization
Python & Friends
Cons
Solutions
• Slow for general computation
• Mixed bag of IDEs, some are great
for coding, others for research
• Cython, JIT compilers like PyPy
• There are a few good options out
there that I’ve found:
– Eclipse + PyDev, NetBeanz
– Spyder – closest to MATLAB
– Sage Math Notebook, IPython – like
Mathematica
– It may come down to preference.
• Out of the box it’s a poor
alternative: no linear algebra or
data visualization
• NumPy + SciPy + Matplotlib =
PyLab
– Sage Math includes these as well as
other capabilities like symbolic math
and graph theory
Pythonic?
• A term of praise used by the community to refer to clean code that
is readable, intuitive, explicit, and takes advantage of coding idioms
• Python
people = [‘John Doe’, ’Jane Doe’, ’John Smith’]
smith_family = []
for name in people:
if ‘Smith’ in name:
smith_family.append(name)
smith_family = [name for name in people if ‘Smith’ in name]
• Matlab
people = {‘John Doe’, ’Jane Doe’, ’John Smith’};
smith_family = {}
for name = people
if strfind(name{1},’Smith’)
smith_family = [smith_family name];
end
end
Installation
• On any OS:
– Sage Math (http://www.sagemath.org/), easy unzip
installation but many “extraneous” packages (500MB)
• Some issues on OSX with matplotlib
• On OSX:
– Use MacPorts to install Python (2.7), SciPy, matplotlib,
and Cython
• Requires gcc compiler available through Apple Developer
NumPy + SciPy vs Matlab
• Same core libraries: LAPACK
• Equivalent syntax but not trying to be similar
• http://www.scipy.org/
NumPy_for_Matlab_Users
• Key differences:
– Python uses 0 (zero) based indexing. The initial
element of a sequence is found using [0].
– In NumPy arrays have pass-by-reference
semantics. Slice operations are views into an array.
Syntax
Matlab
• a\b
• max(a(:))
• a(end-4:end)
• [0:9]
NumPy
• linalg.lstsq(a,b)
• a.max()
• a[-5:]
• arange(10.) or
r_[:10.]
Cython
• Requires a C compiler
• Cython is Python with C data types.
– Dynamic typing of Python has overhead, slow for computation
• Allows seamless coding of Python and embedded C-speed routines
• Python values and C values can be freely intermixed, with
conversions occurring automatically wherever possible
– This means for debugging C-level code, we can use all the plotting
tools available in Python
• Process is sort of like EPIC
1. Write a .pyx source file
2. Run the Cython compiler to generate a C file
3. Run a C compiler to generate a compiled library
4. Run the Python interpreter and ask it to import the module
Code Comparison – Matlab
• Let’s try a really basic speed comparison test
s = 0
tic
for i = 1:1e8
s = s + i;
end
toc
tic
x = 1:1e8;
sum(x)
toc
Code Comparison – C
#include <time.h>
#include <stdio.h>
int main()
{
long long unsigned int sum = 0;
long long unsigned int i = 0;
long long unsigned int max = 100000000;
clock_t tic =
for (i = 0; i
sum = sum
}
clock_t toc =
clock();
<= max; i++) {
+ i;
clock();
printf("%15lld, Elapsed: %f seconds\n", sum, (double)(toc - tic) /
CLOCKS_PER_SEC);
return 0;
}
Code Comparison – Python
import time
from numpy import *
s = 0
t = time.time()
for i in xrange(100000001):
s += i
print time.time() - t
t = time.time()
x = arange(100000001)
sum(x)
print time.time() - t
Code Comparison – Cython
• addCy.pyx
import time
cdef long long int n = 100000000
cdef long long int s = 0
cdef long long int i = 0
t = time.time()
for i in xrange(n+1):
s += i
print time.time() – t
• runCy.py
import pyximport; pyximport.install()
import addCy
Speed Comparison
Language/Implementation
Time (sec)
Matlab/For loop
Matlab/Vector sum
Python/For loop
Python/NumPy sum
0.547
0.817 (0.036 for sum only!)
15.944
0.648 (0.135 for sum only)
C/For loop
Cython/For loop
0.222
0.068 (!)
Summary
• Python
– Full featured programming language with an emphasis on
“pythonic” readability
• NumPy/SciPy
– Core libraries for linear algebra and computation (fft,
optimization)
• Cython
– Allows as much optimization as you want, degrading
gracefully from high-level Python to low-level C
– Profile, don’t over optimize too early!
Download