PPTX file - Bioinf! - Trinity College Dublin

advertisement
A Brief Introduction to Scientific
Programming with Python
TCD, 26/08/2015
Karsten Hokamp, PhD
TCD Bioinformatics Support Team
Trinity College Dublin, The University of Dublin
Overview
• Programming
• First Python script/program
• Why Python?
• Bioinformatics examples
• Additional resources
• Outlook
Trinity College Dublin, The University of Dublin
What is programming and why bother?
 Data processing
 Automation
 Combination of programs for analysis pipelines
 More control and flexibility
 Better understanding of how programs work
Trinity College Dublin, The University of Dublin
Programming Concepts
 Turn into a very meticulous problem solver
 Break problems into small details
 Keep it variable
 Give very precise instructions
Trinity College Dublin, The University of Dublin
Programming Concepts
"human" recipe
Trinity College Dublin, The University of Dublin
Programming Concepts
"computerised" recipe
Trinity College Dublin, The University of Dublin
Mac for Windows users
The main differences:
 cmd instead of ctrl (e.g. cmd-C for copying)
 right-click mouse: ctrl-click
 # character: alt-3
 switch between applications: cmd-tab
 Spotlight (top right) for finding files/programs
 Apple symbol (top left) for logging out
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
open through Spotlight
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
Alternatively: open through Finder
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
interactive Python console
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
simple Python statement
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
user input
output
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
user input
output
try a few simple
numeric operations
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
repeat/combine
previous commands
by clicking into
them and hitting return
(use left/right arrows
and delete to edit them)
Trinity College Dublin, The University of Dublin
IDLE: Integrated DeveLopment Environment
Console vs Editor
Console
Editor
interactive
requires extra click for running
great for trying out code
additional IDLE functionality
not suited for long scripts
suited for long scripts
no saving of code
allows to save code
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
open a new file
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
write some code
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
run your code
shortcut: F5
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
save file first
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
specify a file name
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
write more code
IDLE provides help
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
save and run:
cmd-S then F5
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
make it personal
Trinity College Dublin, The University of Dublin
IDLE: Writing Python Scripts
keep going
Trinity College Dublin, The University of Dublin
Python vs Perl
the equivalent
in Perl
Trinity College Dublin, The University of Dublin
Python vs Perl
the equivalent
in Perl
Trinity College Dublin, The University of Dublin
Python vs Perl
Python
• fewer special characters
• indentation enforced
• more user-friendly functions
Trinity College Dublin, The University of Dublin
Perl
Why Python?
 easy to learn
 great for beginners
 enforces clean coding
 great for teachers
 comes with IDE
 avoids command-line usage
 object-orientated
 code reuse and recycling
 very popular
 many peers
 BioPython
 many bioinformatics modules
Trinity College Dublin, The University of Dublin
Simple Bioinformatics Example
built-in function 'len'
Trinity College Dublin, The University of Dublin
Simple Bioinformatics Example
built-in function 'set'
Trinity College Dublin, The University of Dublin
Simple Bioinformatics Example
built-in functions 'sorted' and 'set'
Trinity College Dublin, The University of Dublin
Simple Bioinformatics Example
string method 'count'
Trinity College Dublin, The University of Dublin
Simple Bioinformatics Example
string method 'upper'
Trinity College Dublin, The University of Dublin
 Basic sequence manipulation
 Fetch records from databases
 Multiple sequence alignment (Clustal, Muscle)
 Sequence similarity search (Blast)
 Working with motifs: MEME, Jaspar, Transfac
 Phylogenetics
 Clustering
 Visualisation
Trinity College Dublin, The University of Dublin
 Parsing GenBank records:
from Bio import SeqIO
record = SeqIO.read("AE014613.1.gb", "genbank")
record.description
 'Salmonella enterica subsp. enterica serovar Typhi Ty2, complete genome.'
len(record.features)
 9086
Trinity College Dublin, The University of Dublin
 Parsing sequence records:
from Bio import SeqIO
for entry in SeqIO.parse("tlr4_protein.fa", "fasta") :
print(entry.description)
print(len(entry), 'bp')
gi|765368240|gb|AJR32867.1| TLR4 [Gallus gallus]
843 bp
gi|111414439|gb|ABH09759.1| toll-like receptor 4 [Bos taurus]
841 bp
gi|6175873|gb|AAF05316.1|AF177765_1 toll-like receptor 4 [Homo sapiens]
839 bp
…
Trinity College Dublin, The University of Dublin
 Graphics:
Chromosomes colour-coded by GC content (Bioinformatics with Python Cookbook)
Trinity College Dublin, The University of Dublin
 Graphics:
Coloured phylogenetic tree from Ebola sequences (Bioinformatics with Python Cookbook)
Trinity College Dublin, The University of Dublin
Additional Resources
https://store.continuum.io/cshop/anaconda/
Trinity College Dublin, The University of Dublin
Visualisations with Matplotlib
http://matplotlib.org/gallery.html
Trinity College Dublin, The University of Dublin
Examples
http://scikit-learn.org
Trinity College Dublin, The University of Dublin
Scikit-learn – Machine Learning in Python
• Machine Learning: PCA of Iris data set
http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html
Trinity College Dublin, The University of Dublin
Python Help
Trinity College Dublin, The University of Dublin
Online courses
 http://biopython.org/DIST/docs/tutorial/Tutorial.html
 http://dowell.colorado.edu/education-python.html
 http://www.pasteur.fr/formation/infobio/python
 https://www.codecademy.com/tracks/python
 http://anh.cs.luc.edu/python/hands-on/
 https://www.coursera.org
Trinity College Dublin, The University of Dublin
Books
Trinity College Dublin, The University of Dublin
Conclusions
• You have been briefly introduced to Python and IDLE.
• You have learnt about programming concepts.
• You have seen examples of what can be accomplished through Python.
• Topics of an extensive Python course:
• Coding in Python – variables, scope, functions…
• Bioinformatics with BioPython
• Automated biological data analysis – your interests!
Trinity College Dublin, The University of Dublin
Thank You!
http://bioinf.gen.tcd.ie/workshops/python
Trinity College Dublin, The University of Dublin
Don't forget to log out!
Trinity College Dublin, The University of Dublin
Download