Scientific Computing Beyond Matlab Nov 19, 2012 Jason Su Motivation • I’m interested in (re-)coding a general solver for sc/mcDESPOT relaxometry mapping – Open source – Extensibility to new/add’l sequences with better sensitivity to certain parameters, e.g. B0 and MWF – Better parallelization • But: – Large-scale code development in Matlab is cumbersome – Matlab is slow – C is hard (to write, read, debug) • Creates large barrier for others to contribute Matlab Pros • Ubiquitous, code is crossplatform • Can be fast with vectorized code • Data visualization • Quick development time • Great IDE for general research – Poor for large projects • Many useful native libraries/toolboxes • Built-in profiling tools Cons • Requires license, not free (though there is Octave) • Vectorized code is often non-intuitive to write and hard to read • Slow for general computations • Limited parallel computing and GPU support C/C++ Pros • Fast • Great IDEs for large coding projects – Not as great for general science work • Strong parallel computer support and CUDA • Community libraries for scientific computing • Profiling dependent on IDE Cons • High learning curve and development time • No data visualization • Compiled code is platform specific • Compiler is not generally installed with OSX and Windows Python Pros • Preinstalled with OSX and Linux-based systems • Readability is a core tenet (“pythonic”) • Quick development time • Native parallel computing support and community GPU modules • Extensive community support – Including neuroimagingspecific: NiPype, NiBabel • Built-in profiling module and some IDE tools Cons • Slow for general computation • Mixed bag of IDEs, some are great for coding, others for research • Out of the box it’s a poor alternative: no linear algebra or data visualization Python & Friends Cons Solutions • Slow for general computation • Mixed bag of IDEs, some are great for coding, others for research • Cython, JIT compilers like PyPy • There are a few good options out there that I’ve found: – Eclipse + PyDev, NetBeanz – Spyder – closest to MATLAB – Sage Math Notebook, IPython – like Mathematica – It may come down to preference. • Out of the box it’s a poor alternative: no linear algebra or data visualization • NumPy + SciPy + Matplotlib = PyLab – Sage Math includes these as well as other capabilities like symbolic math and graph theory Pythonic? • A term of praise used by the community to refer to clean code that is readable, intuitive, explicit, and takes advantage of coding idioms • Python people = [‘John Doe’, ’Jane Doe’, ’John Smith’] smith_family = [] for name in people: if ‘Smith’ in name: smith_family.append(name) smith_family = [name for name in people if ‘Smith’ in name] • Matlab people = {‘John Doe’, ’Jane Doe’, ’John Smith’}; smith_family = {} for name = people if strfind(name{1},’Smith’) smith_family = [smith_family name]; end end Installation • On any OS: – Sage Math (http://www.sagemath.org/), easy unzip installation but many “extraneous” packages (500MB) • Some issues on OSX with matplotlib • On OSX: – Use MacPorts to install Python (2.7), SciPy, matplotlib, and Cython • Requires gcc compiler available through Apple Developer NumPy + SciPy vs Matlab • Same core libraries: LAPACK • Equivalent syntax but not trying to be similar • http://www.scipy.org/ NumPy_for_Matlab_Users • Key differences: – Python uses 0 (zero) based indexing. The initial element of a sequence is found using [0]. – In NumPy arrays have pass-by-reference semantics. Slice operations are views into an array. Syntax Matlab • a\b • max(a(:)) • a(end-4:end) • [0:9] NumPy • linalg.lstsq(a,b) • a.max() • a[-5:] • arange(10.) or r_[:10.] Cython • Requires a C compiler • Cython is Python with C data types. – Dynamic typing of Python has overhead, slow for computation • Allows seamless coding of Python and embedded C-speed routines • Python values and C values can be freely intermixed, with conversions occurring automatically wherever possible – This means for debugging C-level code, we can use all the plotting tools available in Python • Process is sort of like EPIC 1. Write a .pyx source file 2. Run the Cython compiler to generate a C file 3. Run a C compiler to generate a compiled library 4. Run the Python interpreter and ask it to import the module Code Comparison – Matlab • Let’s try a really basic speed comparison test s = 0 tic for i = 1:1e8 s = s + i; end toc tic x = 1:1e8; sum(x) toc Code Comparison – C #include <time.h> #include <stdio.h> int main() { long long unsigned int sum = 0; long long unsigned int i = 0; long long unsigned int max = 100000000; clock_t tic = for (i = 0; i sum = sum } clock_t toc = clock(); <= max; i++) { + i; clock(); printf("%15lld, Elapsed: %f seconds\n", sum, (double)(toc - tic) / CLOCKS_PER_SEC); return 0; } Code Comparison – Python import time from numpy import * s = 0 t = time.time() for i in xrange(100000001): s += i print time.time() - t t = time.time() x = arange(100000001) sum(x) print time.time() - t Code Comparison – Cython • addCy.pyx import time cdef long long int n = 100000000 cdef long long int s = 0 cdef long long int i = 0 t = time.time() for i in xrange(n+1): s += i print time.time() – t • runCy.py import pyximport; pyximport.install() import addCy Speed Comparison Language/Implementation Time (sec) Matlab/For loop Matlab/Vector sum Python/For loop Python/NumPy sum 0.547 0.817 (0.036 for sum only!) 15.944 0.648 (0.135 for sum only) C/For loop Cython/For loop 0.222 0.068 (!) Summary • Python – Full featured programming language with an emphasis on “pythonic” readability • NumPy/SciPy – Core libraries for linear algebra and computation (fft, optimization) • Cython – Allows as much optimization as you want, degrading gracefully from high-level Python to low-level C – Profile, don’t over optimize too early!