why to become a Pyologist Perl is for plumbers – Python is for biologists Stefan Maetschke Teasdale Group 1 why why, why, why … Biologists suffer for no good reason Perl is difficult to write and read Perl gives weak error feedback Perl obscures basic concepts Limited understanding of principles Low productivity Reduced research scope Perl is for plumbers - Python is for scientists I want to have an easy life 2 plumbers and others spectrum of tasks, tools and roles sys admin scientist plumbing vi awk/Perl grep/diff SW developer designing Emacs/IDE C/C++/Java UML/Unit test Python 3 equals( , ) Python Perl Guido van Rossum Larry Wall 1991 1987 Cross-platform, open-source, scripting language, multi-paradigm, dynamic typing, statement ratio: 6 There should be one way There’s more than one way Easy Difficult 4 you must be joking! list = [ [‘a’, ’b’, ’c’], [1, 2, 3] ] print list[0] @list = ( [‘a’, ’b’, ’c’], [1, 2, 3] ); print “@{$list[0]}\n”; list = ['a', 'b', 'c'] hash = {} hash[‘letters'] = list print hash[‘letters'] my @list = ('a', 'b', 'c'); my %hash; $hash{‘letters'} = \@list; print "@{$hash{‘letters'}}\n"; class Person: def __init__(self, age): self.age = age package Person; use strict; sub new { my $class = shift; my $age = shift or die "Must pass age"; my $rSelf = {'age' => $age}; bless ($rSelf, $class); return $rSelf; } http://www.strombergers.com/python/ 5 More Perl bashing… def add(a, b): return a + b sub add { $_[0] + $_[1]; } sub add { my ($a, $b) = _@; return $a + $b; } def diff(a, b): return len(a) - len(b) http://www.strombergers.com/python/ sub add($, $) { local ($a, $b) = _@; return $a + $b; } sub add { my $a = shift; my $b = shift; return $a + $b; } sub diff { my ($aref, $bref) = _@; my (@a) = @$aref; my (@b) = @$bref; return scalar(@a) + scalar(@b); } 6 complexity wall everything you can do in Python you can do in Perl but you don’t simple scripts ≈ 100 lines => fun stops Higher order concepts Data structures Functions Classes => Python allows you to break through the complexity wall 7 googliness X language C Java C++ C# Perl Python Ruby Scala Haskell kilo-hits, May 2008 53,000 7,760 1,290 1,020 1,150 527 470 394 212 X load file 1,820 2,890 3,100 794 685 798 806 354 323 X bioinformatics 572 320 231 161 101 199 186 69 74 8 and the winner is… <- without Psyco http://shootout.alioth.debian.org/ 9 damn lies and stats sourceforge projects Perl declining, Python increasing ? May 2008, keyword search : Perl 3474, Python 4063 http://rengelink.textdriven.com/blog/ 10 see the light… classify Iris plants Three species: • Iris setosa • Iris versicolor • Iris virginica Four attributes: • sepal length • sepal width • petal length • petal width Fisher, R.A. "The use of multiple measurements in taxonomic problems" Annual Eugenics, 7, Part II, 179-188 (1936) http://archive.ics.uci.edu/ml/datasets/Iris 11 Iris – convert data 12 Iris – correlation 13 Iris – do stats 14 Iris – linear regression 15 Iris – plot data 16 libs for life science Scientific computing: SciPy, NumPy, matplotlib Bioinformatics: BioPython Phylogenetic trees: Mavric, Plone, P4, Newick Microarrays: SciGraph, CompClust Molecular modeling: MMTK, OpenBabel, CDK, RDKit, cinfony, mmLib Dynamic systems modeling: PyDSTools Protein structure visualization: PyMol, UCSF Chimera Networks/Graphs: NetworkX, PyGraphViz Symbolic math: SymPy, Sage Wrapper for C/C++ code: SWIG, Pyrex, Cython R/SPlus interface: RSPython, RPy Java interface: Jython Fortran to Python: F2PY … Check also out: and: http://www.scipy.org/Topical_Software http://pypi.python.org/pypi 17 last words Perl perfect for plumbing Python excellent for scientific programming Easy to learn, write and maintain Suited for scripting and mid-size projects Huge number of scientific libraries Python is an attractive alternative to Matlab/R Easy integration of Java, C/C++ or Fortran code 18 questions isn’t Python lovely… Interest: Python Course? 19 20 links Wikipedia – Python http://en.wikipedia.org/wiki/Python Instant Python http://hetland.org/writing/instant-python.html How to think like a computer scientist http://openbookproject.net//thinkCSpy/ Dive into Python http://www.diveintopython.org/ Python course in bioinformatics http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html Beginning Python for bioinformatics http://www.onlamp.com/pub/a/python/2002/10/17/biopython.html SciPy Cookbook http://www.scipy.org/Cookbook Matplotlib Cookbook http://www.scipy.org/Cookbook/Matplotlib Biopython tutorial and cookbook http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial.html Huge collection of Python tutorial http://www.awaretek.com/tutorials.html What’s wrong with Perl http://www.garshol.priv.no/download/text/perl.html 20 Stages of Perl to Python conversion http://aspn.activestate.com/ASPN/Mail/Message/python-list/1323993 Why Python http://www.linuxjournal.com/article/3882 21 some papers Bassi S. (2007) A Primer on Python for Life Science Researchers. PLoS Comput Biol 3(11): e199. doi:10.1371/journal.pcbi.0030199 Mangalam H. (2002) The Bio* toolkits--a brief overview. Brief Bioinform. 3(3):296-302. Fourment M., Gillings MR. (2008) A comparison of common programming languages used in bioinformatics. BMC Bioinformatics 9:82. 22 to whom it may concern NP = Non-Programmer NPs who don’t use Perl yet NPs who want to see the light NPs who want to give their code away without being rightfully ashamed Matlab aficionados 23 one of ten Perl myths http://www.perl.com/pub/a/2000/01/10PerlMyths.html “…we can happily consign the idea that ‘Perl is hard’ to mythology.” Swap two sections of a string: “aaa:bbb” -> “bbb:aaa” “…Perl works the way you do…” while (<>) { s/(.*):(.*)/$2:$1/; print; } while (<>) { chomp; ($first, $second) = split /:/; print $second, ":", $first, "\n"; } “…That's one, fairly natural way to think about it…” for line in file: line = line.strip() first, second = line.split(‘:’) print second+’:’+first from re import sub for line in file: print sub(‘(.*):(.*)’, r’\2:\1’, line) 24 camel chaos does not scale well complex syntax cryptic commands does not encourage clear code difficult to read/maintain hard to understand the principles error prone no check of subroutine arguments variables are global by default … 25 why Python overcome the complexity wall many, excellent scientific libraries clear, easy to learn syntax hard to do it wrong does not require prior suffering/experience 26 my bias R&D: C/C++ -> applied ML in robotics, image processing, quality control SW Development: Java -> Speech Processing, Data Mining Computational Biology: Java, Python Other languages I played with: Ada, APL, Basic, MatLab, Modula, Pascal, Perl, Prolog, R, Groovy, Forth, Fortran, Scala, Assembly code 27