Non-linear optimization

advertisement
why to become a Pyologist
Perl is for plumbers – Python is for biologists
Stefan Maetschke
Teasdale Group
1
why
why, why, why …

Biologists suffer for no good reason








Perl is difficult to write and read
Perl gives weak error feedback
Perl obscures basic concepts
Limited understanding of principles
Low productivity
Reduced research scope
Perl is for plumbers - Python is for scientists
I want to have an easy life
2
plumbers and others
spectrum of tasks, tools and roles
sys admin




scientist
plumbing
vi
awk/Perl
grep/diff
SW developer




designing
Emacs/IDE
C/C++/Java
UML/Unit test
Python
3
equals(
,
)
Python
Perl
Guido van Rossum
Larry Wall
1991
1987
Cross-platform, open-source, scripting language,
multi-paradigm, dynamic typing, statement ratio: 6
There should be one way
There’s more than one way
Easy
Difficult
4
you must be joking!
list = [ [‘a’, ’b’, ’c’], [1, 2, 3] ]
print list[0]
@list = ( [‘a’, ’b’, ’c’], [1, 2, 3] );
print “@{$list[0]}\n”;
list = ['a', 'b', 'c']
hash = {}
hash[‘letters'] = list
print hash[‘letters']
my @list = ('a', 'b', 'c');
my %hash;
$hash{‘letters'} = \@list;
print "@{$hash{‘letters'}}\n";
class Person:
def __init__(self, age):
self.age = age
package Person;
use strict;
sub new {
my $class = shift;
my $age = shift or die "Must pass age";
my $rSelf = {'age' => $age};
bless ($rSelf, $class);
return $rSelf;
}
http://www.strombergers.com/python/
5
More Perl bashing…
def add(a, b):
return a + b
sub add {
$_[0] + $_[1];
}
sub add {
my ($a, $b) = _@;
return $a + $b;
}
def diff(a, b):
return len(a) - len(b)
http://www.strombergers.com/python/
sub add($, $) {
local ($a, $b) = _@;
return $a + $b;
}
sub add {
my $a = shift;
my $b = shift;
return $a + $b;
}
sub diff {
my ($aref, $bref) = _@;
my (@a) = @$aref;
my (@b) = @$bref;
return scalar(@a) + scalar(@b);
}
6
complexity wall
everything you can do in Python you can do in Perl but you don’t
simple scripts
≈ 100 lines
=> fun stops
Higher order
concepts
Data structures
Functions
Classes
=> Python allows you to break through the complexity wall
7
googliness
X language









C
Java
C++
C#
Perl
Python
Ruby
Scala
Haskell
kilo-hits, May 2008
53,000
7,760
1,290
1,020
1,150
527
470
394
212
X load file
1,820
2,890
3,100
794
685
798
806
354
323
X bioinformatics
572
320
231
161
101
199
186
69
74
8
and the winner is…
<- without Psyco
http://shootout.alioth.debian.org/
9
damn lies and stats
sourceforge projects
 Perl declining, Python increasing ?
 May 2008, keyword search : Perl 3474, Python 4063
http://rengelink.textdriven.com/blog/
10
see the light…
classify Iris plants
Three species:
• Iris setosa
• Iris versicolor
• Iris virginica
Four attributes:
• sepal length
• sepal width
• petal length
• petal width
Fisher, R.A.
"The use of multiple measurements in taxonomic problems"
Annual Eugenics, 7, Part II, 179-188 (1936)
http://archive.ics.uci.edu/ml/datasets/Iris
11
Iris – convert data
12
Iris – correlation
13
Iris – do stats
14
Iris – linear regression
15
Iris – plot data
16
libs for life science














Scientific computing: SciPy, NumPy, matplotlib
Bioinformatics: BioPython
Phylogenetic trees: Mavric, Plone, P4, Newick
Microarrays: SciGraph, CompClust
Molecular modeling: MMTK, OpenBabel, CDK, RDKit, cinfony,
mmLib
Dynamic systems modeling: PyDSTools
Protein structure visualization: PyMol, UCSF Chimera
Networks/Graphs: NetworkX, PyGraphViz
Symbolic math: SymPy, Sage
Wrapper for C/C++ code: SWIG, Pyrex, Cython
R/SPlus interface: RSPython, RPy
Java interface: Jython
Fortran to Python: F2PY
…
Check also out:
and:
http://www.scipy.org/Topical_Software
http://pypi.python.org/pypi
17
last words


Perl perfect for plumbing
Python excellent for scientific programming





Easy to learn, write and maintain
Suited for scripting and mid-size projects
Huge number of scientific libraries
Python is an attractive alternative to Matlab/R
Easy integration of Java, C/C++ or Fortran code
18
questions
isn’t Python lovely…
Interest:
Python Course?
19
20
links












Wikipedia – Python
http://en.wikipedia.org/wiki/Python
Instant Python
http://hetland.org/writing/instant-python.html
How to think like a computer scientist
http://openbookproject.net//thinkCSpy/
Dive into Python
http://www.diveintopython.org/
Python course in bioinformatics
http://www.pasteur.fr/recherche/unites/sis/formation/python/index.html
Beginning Python for bioinformatics
http://www.onlamp.com/pub/a/python/2002/10/17/biopython.html
SciPy Cookbook
http://www.scipy.org/Cookbook
Matplotlib Cookbook
http://www.scipy.org/Cookbook/Matplotlib
Biopython tutorial and cookbook
http://www.bioinformatics.org/bradstuff/bp/tut/Tutorial.html
Huge collection of Python tutorial
http://www.awaretek.com/tutorials.html
What’s wrong with Perl
http://www.garshol.priv.no/download/text/perl.html
20 Stages of Perl to Python conversion
http://aspn.activestate.com/ASPN/Mail/Message/python-list/1323993
Why Python
http://www.linuxjournal.com/article/3882
21
some papers



Bassi S. (2007)
A Primer on Python for Life Science Researchers.
PLoS Comput Biol 3(11): e199. doi:10.1371/journal.pcbi.0030199
Mangalam H. (2002)
The Bio* toolkits--a brief overview.
Brief Bioinform. 3(3):296-302.
Fourment M., Gillings MR. (2008)
A comparison of common programming languages used in bioinformatics.
BMC Bioinformatics 9:82.
22
to whom it may concern
NP = Non-Programmer




NPs who don’t use Perl yet
NPs who want to see the light
NPs who want to give their code away
without being rightfully ashamed
Matlab aficionados
23
one of ten Perl myths
http://www.perl.com/pub/a/2000/01/10PerlMyths.html
“…we can happily consign the idea that ‘Perl is hard’ to mythology.”
Swap two sections of a string: “aaa:bbb” -> “bbb:aaa”
“…Perl works the way you do…”
while (<>) {
s/(.*):(.*)/$2:$1/;
print;
}
while (<>) {
chomp;
($first, $second) = split /:/;
print $second, ":", $first, "\n";
}
“…That's one, fairly natural way to think about it…”
for line in file:
line = line.strip()
first, second = line.split(‘:’)
print second+’:’+first
from re import sub
for line in file:
print sub(‘(.*):(.*)’, r’\2:\1’, line)
24
camel chaos







does not scale well
complex syntax
cryptic commands
does not encourage clear code
difficult to read/maintain
hard to understand the principles
error prone



no check of subroutine arguments
variables are global by default
…
25
why Python





overcome the complexity wall
many, excellent scientific libraries
clear, easy to learn syntax
hard to do it wrong
does not require prior suffering/experience
26
my bias




R&D: C/C++ ->
applied ML in robotics, image processing, quality control
SW Development: Java ->
Speech Processing, Data Mining
Computational Biology: Java, Python
Other languages I played with:
Ada, APL, Basic, MatLab, Modula, Pascal, Perl, Prolog, R,
Groovy, Forth, Fortran, Scala, Assembly code
27
Download