3rd Summer School in Computational Biology September 8, 2014 Frank Emmert-Streib Computational Biology and Machine Learning Laboratory Center for Cancer Research and Cell Biology Queen’s University Belfast, UK Organizers of the summer school General questions: Frank Emmert-Streib f.emmert-streib@qub.ac.uk Shu-Dong Zhang s.zhang@qub.ac.uk Lecturers of the summer school Ken Mills Darragh McArt Ricardo Matos Simoes . ShaileshdeTripathi Salissou Moutari James McCann Alexey Stupnikov Frank Emmert-Streib Shu-Dong Zhang & Kevin Keenan, David Simpson, Caroline Meharg, Myrto Kostadima, Bori Mifsud We thank our sponsors History of the summer school Number of participants 40 35 35 30 25 25 20 18 15 10 5 0 year 2012 2013 2014 Organizational notes • Coffee breaks (short - foyer) • Lunch (1 hour) • Sign-in sheets • Internet access: – Students from QUB: Use your QUB account – External students: Guest account Shailesh Tripathi Schedule introductory level (undergraduate level w.r.t CB!) What will we learn? • different high-throughput data types: – Microarray data – Sequencing data (DNA-seq, RNA-seq, ChIP-seq) • basic statistics and machine learning methods – Hypothesis testing – Supervised & unsupervised learning • basic data visualization • importance of large-scale data in modern biology systems biology Interdisciplinary summer school Vision of the VC Universities require interdisciplinary engagement in the educational and research effort Professor Patrick Johnston of President and Vice-Chancellor (VC) of Queen’s University What will we not learn? (Adjusting expectations) • Example: – When learning a foreign language, how much can you learn in 3 days? • Analogy: – programming language – statistics/machine learning – biology The time it takes to become proficient in computational biology is comparable to the time to learn a language. Good news! • The summer school in computational biology provides you with a guided start. • When you are from Belfast: – Journal club: computational biology and biostatistics (every Monday in the HSB, 3pm) – Degree: MSc in Computational Genomics & Bioinformatics – General problems/questions: Frank Emmert-Streib High-throughput data Data Types Central Dogma of Molecular Biology Francis Crick, 1956 Reproducible Research What is reproducible research? Reproducible research is the ability that an entire study can be reproduced, either by the same researcher or an independent researcher. In this context is important. Example In order to understand the meaning of reproducible research let’s consider the following examples. x Task: Produce the figure. P(R) f y t arget concept h P(R ) = y x x R 4 Example In order to understand the meaning of reproducible research let’s consider the following examples. Task: Produce the figure. x Approach: Adobe Illustrator Gimp CorelDraw Powerpoint P(R) f y t arget concept h P(R ) = y x x R 4 Example In order to understand the meaning of reproducible research let’s consider the following examples. Task: Produce the figure. x Summary: How long did it take? t=30min How did you do it? Describe it in a report. P(R) f y t arget concept h P(R ) = y x x R 4 Example When you publish results, e.g., x t arget concept h P(R) f y P(R ) = 4 y x x R and someone wants to repeat the same or a similar analysis – How long does a re-analysis take? – How is a re-analysis done? Example When you publish results, e.g., x P(R) f y t arget concept h P(R ) = 4 y x x R and someone wants to repeat the same or a similar analysis – How long does a re-analysis take? – 30min – How is a re-analysis done? – depends on the report you provided & the availability of the software Alternative way to generate results Create the figure by writing a program. • Latex • freely available Comparison Proprietary Software with GUI Programming language Time for you to create figure for the first time t = 30min t = 30min Time for you to create figure for the n-th time ts < t (ts = 20min) tp < t (tp << 1sec) Time for someone else to create the same figure for first time t’ ~ t (t’ = 30min) t’’ ~ tp (t’’<< 1sec Need to pay for license? Yes No Figure reproducible by everyone? No Yes Back to data analysis The same line or argumentation holds for the analysis of data. • Create a figure -> conduct a data analysis • Adobe Illustrator -> Partek, GenomeStudio etc Back to data analysis The same line or argumentation holds for the analysis of data. • Create a figure -> conduct a data analysis • Adobe Illustrator -> Partek, GenomeStudio etc In order to obtain reproducible results in ‘genomics’ we use R. Reproducible research • Analyze data by writing programs in R. • Share your data & your programs with others. Other groups can reproduce your results. For this reason we use R in this summer school. Data sharing US National Institute of Health (NIH) requires that all generated genomics data funded by NIH must be shared online. Nature, 4 September 2014 Mandatory! Enjoy the summer school!