A Brief Introduction to Scientific Programming with Python TCD, 26/08/2015 Karsten Hokamp, PhD TCD Bioinformatics Support Team Trinity College Dublin, The University of Dublin Overview • Programming • First Python script/program • Why Python? • Bioinformatics examples • Additional resources • Outlook Trinity College Dublin, The University of Dublin What is programming and why bother? Data processing Automation Combination of programs for analysis pipelines More control and flexibility Better understanding of how programs work Trinity College Dublin, The University of Dublin Programming Concepts Turn into a very meticulous problem solver Break problems into small details Keep it variable Give very precise instructions Trinity College Dublin, The University of Dublin Programming Concepts "human" recipe Trinity College Dublin, The University of Dublin Programming Concepts "computerised" recipe Trinity College Dublin, The University of Dublin Mac for Windows users The main differences: cmd instead of ctrl (e.g. cmd-C for copying) right-click mouse: ctrl-click # character: alt-3 switch between applications: cmd-tab Spotlight (top right) for finding files/programs Apple symbol (top left) for logging out Trinity College Dublin, The University of Dublin IDLE: Integrated DeveLopment Environment open through Spotlight Trinity College Dublin, The University of Dublin IDLE: Integrated DeveLopment Environment Trinity College Dublin, The University of Dublin IDLE: Integrated DeveLopment Environment Alternatively: open through Finder Trinity College Dublin, The University of Dublin IDLE: Integrated DeveLopment Environment interactive Python console Trinity College Dublin, The University of Dublin IDLE: Integrated DeveLopment Environment simple Python statement Trinity College Dublin, The University of Dublin IDLE: Integrated DeveLopment Environment user input output Trinity College Dublin, The University of Dublin IDLE: Integrated DeveLopment Environment user input output try a few simple numeric operations Trinity College Dublin, The University of Dublin IDLE: Integrated DeveLopment Environment repeat/combine previous commands by clicking into them and hitting return (use left/right arrows and delete to edit them) Trinity College Dublin, The University of Dublin IDLE: Integrated DeveLopment Environment Console vs Editor Console Editor interactive requires extra click for running great for trying out code additional IDLE functionality not suited for long scripts suited for long scripts no saving of code allows to save code Trinity College Dublin, The University of Dublin IDLE: Writing Python Scripts open a new file Trinity College Dublin, The University of Dublin IDLE: Writing Python Scripts write some code Trinity College Dublin, The University of Dublin IDLE: Writing Python Scripts run your code shortcut: F5 Trinity College Dublin, The University of Dublin IDLE: Writing Python Scripts save file first Trinity College Dublin, The University of Dublin IDLE: Writing Python Scripts specify a file name Trinity College Dublin, The University of Dublin IDLE: Writing Python Scripts write more code IDLE provides help Trinity College Dublin, The University of Dublin IDLE: Writing Python Scripts save and run: cmd-S then F5 Trinity College Dublin, The University of Dublin IDLE: Writing Python Scripts make it personal Trinity College Dublin, The University of Dublin IDLE: Writing Python Scripts keep going Trinity College Dublin, The University of Dublin Python vs Perl the equivalent in Perl Trinity College Dublin, The University of Dublin Python vs Perl the equivalent in Perl Trinity College Dublin, The University of Dublin Python vs Perl Python • fewer special characters • indentation enforced • more user-friendly functions Trinity College Dublin, The University of Dublin Perl Why Python? easy to learn great for beginners enforces clean coding great for teachers comes with IDE avoids command-line usage object-orientated code reuse and recycling very popular many peers BioPython many bioinformatics modules Trinity College Dublin, The University of Dublin Simple Bioinformatics Example built-in function 'len' Trinity College Dublin, The University of Dublin Simple Bioinformatics Example built-in function 'set' Trinity College Dublin, The University of Dublin Simple Bioinformatics Example built-in functions 'sorted' and 'set' Trinity College Dublin, The University of Dublin Simple Bioinformatics Example string method 'count' Trinity College Dublin, The University of Dublin Simple Bioinformatics Example string method 'upper' Trinity College Dublin, The University of Dublin Basic sequence manipulation Fetch records from databases Multiple sequence alignment (Clustal, Muscle) Sequence similarity search (Blast) Working with motifs: MEME, Jaspar, Transfac Phylogenetics Clustering Visualisation Trinity College Dublin, The University of Dublin Parsing GenBank records: from Bio import SeqIO record = SeqIO.read("AE014613.1.gb", "genbank") record.description 'Salmonella enterica subsp. enterica serovar Typhi Ty2, complete genome.' len(record.features) 9086 Trinity College Dublin, The University of Dublin Parsing sequence records: from Bio import SeqIO for entry in SeqIO.parse("tlr4_protein.fa", "fasta") : print(entry.description) print(len(entry), 'bp') gi|765368240|gb|AJR32867.1| TLR4 [Gallus gallus] 843 bp gi|111414439|gb|ABH09759.1| toll-like receptor 4 [Bos taurus] 841 bp gi|6175873|gb|AAF05316.1|AF177765_1 toll-like receptor 4 [Homo sapiens] 839 bp … Trinity College Dublin, The University of Dublin Graphics: Chromosomes colour-coded by GC content (Bioinformatics with Python Cookbook) Trinity College Dublin, The University of Dublin Graphics: Coloured phylogenetic tree from Ebola sequences (Bioinformatics with Python Cookbook) Trinity College Dublin, The University of Dublin Additional Resources https://store.continuum.io/cshop/anaconda/ Trinity College Dublin, The University of Dublin Visualisations with Matplotlib http://matplotlib.org/gallery.html Trinity College Dublin, The University of Dublin Examples http://scikit-learn.org Trinity College Dublin, The University of Dublin Scikit-learn – Machine Learning in Python • Machine Learning: PCA of Iris data set http://scikit-learn.org/stable/auto_examples/decomposition/plot_pca_iris.html Trinity College Dublin, The University of Dublin Python Help Trinity College Dublin, The University of Dublin Online courses http://biopython.org/DIST/docs/tutorial/Tutorial.html http://dowell.colorado.edu/education-python.html http://www.pasteur.fr/formation/infobio/python https://www.codecademy.com/tracks/python http://anh.cs.luc.edu/python/hands-on/ https://www.coursera.org Trinity College Dublin, The University of Dublin Books Trinity College Dublin, The University of Dublin Conclusions • You have been briefly introduced to Python and IDLE. • You have learnt about programming concepts. • You have seen examples of what can be accomplished through Python. • Topics of an extensive Python course: • Coding in Python – variables, scope, functions… • Bioinformatics with BioPython • Automated biological data analysis – your interests! Trinity College Dublin, The University of Dublin Thank You! http://bioinf.gen.tcd.ie/workshops/python Trinity College Dublin, The University of Dublin Don't forget to log out! Trinity College Dublin, The University of Dublin