A Brief Introduction to Scientific
Programming with Python
TCD, 26/08/2015
Karsten Hokamp, PhD
TCD Bioinformatics Support Team
Trinity College Dublin, The University of Dublin
Overview
• Programming
• First Python script/program
• Why Python?
• Bioinformatics examples
• Additional resources
• Outlook
Trinity College Dublin, The University of Dublin
What is programming and why bother?
Data processing
Automation
Combination of programs for analysis pipelines
More control and flexibility
Better understanding of how programs work
Trinity College Dublin, The University of Dublin
Programming Concepts
Turn into a very meticulous problem solver
Break problems into small details
Keep it variable
Give very precise instructions
Trinity College Dublin, The University of Dublin
Programming Concepts
"human" recipe
Trinity College Dublin, The University of Dublin
Programming Concepts
"computerised" recipe
Trinity College Dublin, The University of Dublin
Mac for Windows users
The main differences:
cmd instead of ctrl (e.g. cmd-C for copying)
right-click mouse: ctrl-click
# character: alt-3
switch between applications: cmd-tab
Spotlight (top right) for finding files/programs
Apple symbol (top left) for logging out
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
open through Spotlight
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
Alternatively: open through Finder
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
interactive Python console
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
simple Python statement
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
user input
output
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
user input
output
try a few simple
numeric operations
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
repeat/combine
previous commands
by clicking into
them and hitting return
(use left/right arrows
and delete to edit them)
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
Console vs Editor
Console
Editor
interactive
requires extra click for running
great for trying out code
additional IDLE functionality
not suited for long scripts
suited for long scripts
no saving of code
allows to save code
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
open a new file
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
write some code
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
run your code
shortcut: F5
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
save file first
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
specify a file name
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
write more code
IDLE provides help
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
save and run:
cmd-S then F5
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
make it personal
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
keep going
Trinity College Dublin, The University of Dublin
Python vs Perl
the equivalent
in Perl
Trinity College Dublin, The University of Dublin
Python vs Perl
the equivalent
in Perl
Trinity College Dublin, The University of Dublin
Python vs Perl
Python
• fewer special characters
• indentation enforced
• more user-friendly functions
Trinity College Dublin, The University of Dublin
Perl
Why Python?
easy to learn
great for beginners
enforces clean coding
great for teachers
comes with IDE
avoids command-line usage
object-orientated
code reuse and recycling
very popular
many peers
BioPython
many bioinformatics modules
Trinity College Dublin, The University of Dublin
Simple Bioinformatics Example
built-in function 'len'
Trinity College Dublin, The University of Dublin
Simple Bioinformatics Example
built-in function 'set'
Trinity College Dublin, The University of Dublin
Simple Bioinformatics Example
built-in functions 'sorted' and 'set'
Trinity College Dublin, The University of Dublin
Simple Bioinformatics Example
string method 'count'
Trinity College Dublin, The University of Dublin
Simple Bioinformatics Example
string method 'upper'
Trinity College Dublin, The University of Dublin
Basic sequence manipulation
Fetch records from databases
Multiple sequence alignment (Clustal, Muscle)
Sequence similarity search (Blast)
Working with motifs: MEME, Jaspar, Transfac
Phylogenetics
Clustering
Visualisation
Trinity College Dublin, The University of Dublin
Parsing GenBank records:
from Bio import SeqIO
record = SeqIO.read("AE014613.1.gb", "genbank")
record.description
'Salmonella enterica subsp. enterica serovar Typhi Ty2, complete genome.'
len(record.features)
9086
Trinity College Dublin, The University of Dublin
Parsing sequence records:
from Bio import SeqIO
for entry in SeqIO.parse("tlr4_protein.fa", "fasta") :
print(entry.description)
print(len(entry), 'bp')
gi|765368240|gb|AJR32867.1| TLR4 [Gallus gallus]
843 bp
gi|111414439|gb|ABH09759.1| toll-like receptor 4 [Bos taurus]
841 bp
gi|6175873|gb|AAF05316.1|AF177765_1 toll-like receptor 4 [Homo sapiens]
839 bp
…
Trinity College Dublin, The University of Dublin
Graphics:
Chromosomes colour-coded by GC content (Bioinformatics with Python Cookbook)
Trinity College Dublin, The University of Dublin
Graphics:
Coloured phylogenetic tree from Ebola sequences (Bioinformatics with Python Cookbook)
Trinity College Dublin, The University of Dublin
Additional Resources
https://store.continuum.io/cshop/anaconda/
Trinity College Dublin, The University of Dublin
Visualisations with Matplotlib
http://matplotlib.org/gallery.html
Trinity College Dublin, The University of Dublin
Examples
http://scikit-learn.org
Trinity College Dublin, The University of Dublin
Scikit-learn – Machine Learning in Python
• Machine Learning: PCA of Iris data set
http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html
Trinity College Dublin, The University of Dublin
Python Help
Trinity College Dublin, The University of Dublin
Online courses
http://biopython.org/DIST/docs/tutorial/Tutorial.html
http://dowell.colorado.edu/education-python.html
http://www.pasteur.fr/formation/infobio/python
https://www.codecademy.com/tracks/python
http://anh.cs.luc.edu/python/hands-on/
https://www.coursera.org
Trinity College Dublin, The University of Dublin
Books
Trinity College Dublin, The University of Dublin
Conclusions
• You have been briefly introduced to Python and IDLE.
• You have learnt about programming concepts.
• You have seen examples of what can be accomplished through Python.
• Topics of an extensive Python course:
• Coding in Python – variables, scope, functions…
• Bioinformatics with BioPython
• Automated biological data analysis – your interests!
Trinity College Dublin, The University of Dublin
Thank You!
http://bioinf.gen.tcd.ie/workshops/python
Trinity College Dublin, The University of Dublin
Don't forget to log out!
Trinity College Dublin, The University of Dublin