Exploring Parallelism with Joseph Pantoga Jon Simington Issues Between Python and C - Python is inherently slower than C - Especially using libraries that take advantage of Python’s relationship with C / C++ code - Thanks interpreter & dynamic typing scheme - Python 3 can be comparable to C in some respects, but still slower on the average case (we use Python 2.7.10) - Python too popular? - So many devs with so many ideas leads to many incomplete projects, but plenty of room for contribution The Global Interpreter Lock - A lock enforced by the Python interpreter to avoid sharing memory with nonthread-safe threads - Limits the amount of parallelism through concurrency when using multiple threads - Very little, if any speedup on a multiprocessor machine The Global Interpreter Lock def countdown(n): while n > 0: n -= 1 Sequential count = 100000000 countdown(count) 7.8s t1 = Thread(target=countdown, args=(count//2,)) t2 = Thread(target=countdown, args=(count//2,)) 2 Threads 15.4s t1.start(); t2.start() t1.join(); t2.join() t1 t2 t3 t4 = = = = Thread(target=countdown, Thread(target=countdown, Thread(target=countdown, Thread(target=countdown, args=(count//4,)) args=(count//4,)) args=(count//4,)) args=(count//4,)) - The GIL ruins everything! - Thread-based Parallelism is often not worth it with Python 4 Threads 15.7s t1.start(); t2.start(); t3.start(); t4.start() t1.join(); t2.join(); t3.join(); t4.join() *test completed on 3.1GHz x4 machine with Python 2.7.10 Getting around the GIL - Make calls to outside libraries and circumvent the interpreter’s rules entirely - Python modules that call external C libraries have inherent latency - BUT! In certain cases, Python + C MPI performance can be comparable to the native C libraries How does Python + C compare to C? - The following was tested on the Beowulf class cluster `Geronimo` at CIMEC with ten Intel P4 2.4GHz processors, each equipped with 1GB DDR 333MHz RAM connected together on a 100Mbps ethernet switch. The mpi4py library was compiled with MPICH 1.2.6 from mpi4py import mpi import numarray as na sbuff = na.array(shape=2**20,type=na.Float64) wt = mpi.Wtime() if mpi.even: mpi.WORLD.Send(buffer, mpi.rank rbuff = mpi.WORLD.Recv(mpi.rank else: rbuff = mpi.WORLD.Recv(mpi.rank mpi.WORLD.Send(buffer, mpi.rank + 1) + 1) - 1) - 1) wt = mpi.Wtime() - wt tp = mpi.WORLD.Gather(wt, root=0) if mpi.zero: print tp http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11 How does Python + C compare to C? The rest of the graphs display time analysis from similar programs, with only the MPI instruction differing. http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11 http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11 How does Python + C compare to C? http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11 http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11 How does Python + C compare to C? http://www.cimec.org.ar/ojs/index.php/cmm/article/viewFile/8/11 - For large data sets, Python performs very similarly to C - Python has less bandwidth available as mpi4py uses an MPI library from C to perform general networking calls - But, in general, Python is slower than C Python’s Parallel Programming Libraries - Message Passing Interface (MPI) - pyMPI - mpi4py - uses the C MPI library directly - Pypar - Scientific Python (MPIlib) - MYMPI - Bulk Synchronous Parallel (BSP) - Scientific Python (BSPlib) pyMPI - Almost-full MPI instruction set - Requires a modified Python interpreter which allows for ‘interactive’ parallelism - Not maintained since 2013 - The modified interpreter is the parallel application -> Have to recompile the interpreter whenever you want to do different tasks Pydusa formerly MYMPI - 33KB Python module -- no custom Python interpreter to maintain - While the MPI Standard contains 120+ routines, MYMPI contains 35 “important” MPI routines - Syntax is very similar to the Fortran, C MPI libraries - Your Python code is the parallel application pypar - No modified interpreter needed! - Still maintained on GitHub - Few MPI interfaces are implemented - Can’t handle topologies well and prefers simple data structures in parallel calculations mpi4py - Still being maintained on Bitbucket (updated 11/23/2015) - Makes calls to external C MPI functions to avoid GIL - Attempts to borrow ideas from other popular modules and integrate them together Scientific Python - GREAT documentation -> Easy to use with their examples - Supports both MPI and BSP - Requires installation of both an MPI and a BSP library Is Parallelism Fully Implemented? - From our research so far, we have not found a publically-available Python package that fully implements the full MPI instruction set - Not all popular languages have complete and extensive libraries for every task or use case! Conclusion - You CAN create parallel programs and applications with Python - Doing so efficiently can require the compilation of a large custom Python Interpreter - Should they try to keep it in future versions or even maintain the current implementations? - From our research it seems like the community has done just about all they could do to bring parallelism to Python but some sacrifices have to be made, mainly a restriction on what data types can and can’t be supported Conclusion Cont. - Maybe Python isn’t the best language to implement parallel algorithms in, but there are many other languages besides C and Fortran which have interesting approaches to solving parallel problems Julia - Really good documentation for parallel tasks with examples - Able to send a task to n connected computers and asynchronously receive the results back, both upon request, and automatically when the task completes - Has pre-defined topology configurations for networks like all-to-all and masterslave - Allows for custom worker configurations to fit your specific topology Go - Fairly good documentation, along with an interactive interpreter on site to learn the basics without installing anything. - Initial installation comes with all required libraries for parallel coding. So no extra libraries to search for or install. - Lightweight and easy to learn - Can write several parallel programs using simple functions in Go Questions? Sources http://www.researchgate.net/profile/Mario_Storti/publication/220380647_MPI_for_Python/links/00b495242ba3 b30eb3000000.pdf http://www.researchgate.net/profile/Leesa_Brieger/publication/221134069_MYMPI__MPI_programming_in_Python/links/0c960521cd051bc649000000.pdf http://uni.getrik.com/wp-content/uploads/2010/04/pyMPI.pdf http://www.researchgate.net/profile/Konrad_Hinsen/publication/220439974_HighLevel_Parallel_Software_Development_with_Python_and_BSP/links/09e4150c048e4e7cd8000000.pdf http://www.researchgate.net/profile/Ola_Skavhaug/publication/222545480_Using_B_SP_and_Python_to_simplif y_parallel_programming/links/0fcfd507e6cac3eb63000000.pdf http://downloads.hindawi.com/journals/sp/2005/619804.pdf Sources http://geco.mines.edu/workshop/aug2010/slides/fri/mympi.pdf http://sourceforge.net/projects/pydusa/ http://docs.julialang.org/en/latest/manual/parallel-computing/ http://dirac.cnrs-orleans.fr/plone/software/scientificpython http://dirac.cnrs-orleans.fr/ScientificPython/ScientificPythonManual/