Computational analysis of biological systems: Past, present and future Sven Bergmann UNIL tenure track commission 5 January 2010 Research Overview Large (genomic) systems Small systems • many uncharacterized • elements well-known elements • relationships unknown • computational analysis should: • many relationships established • aim at quantitative modeling of systems properties like: improve annotation Dynamics reveal relations Robustness reduce complexity Logics PAST Large-scale data analyses How to extract information from very large-scale expression data? Search for transcription modules: Set of genes co-regulated under a certain set of conditions • context specific • allow for overlaps J Ihmels, G Friedlander, SB, O Sarig, Y Ziv & N Barkai Nature Genetics (2002) Identification of transcription modules using many random “seeds” random “seeds” Transcription modules Independent identification: Modules may overlap! SB, J Ihmels & N Barkai Physical Review E (2003) New Tools: Module Visualization http://maya.unil.ch:7575/ExpressionView Data Integration: Example NCI60 60 cancer cell lines (9 tissue types) Drug Response Data ~5,000 drugs Gene Expression Data ~23,000 gene probes How to identify Co-modules? Iteratively refine genes, cell-lines and drugs to get co-modules Z Kutalik, J Beckmann & SB, Nature Biotechnology (2008) 6’189 individuals CoLaus = Cohort Lausanne Genotypes Phenotypes 500.000 SNPs 159 measurement 144 questions Collaboration with: Vincent Mooser (GSK), Peter Vollenweider & Gerard Waeber (CHUV) PCA of POPRES cohort Impact: Web of Science 2005-2009 Impact: Who cites our work? PRESENT Large-scale data analysis Current insights from GWAS: • Well-powered (meta-)studies with (ten-)thousands of samples have identified a few (dozen) candidate loci with highly significant associations • Many of these associations have been replicated in independent studies Current insights from GWAS: • Each locus explains but a tiny (<1%) fraction of the phenotypic variance • All significant loci together explain only a small (<10%) of the variance David Goldstein: “~93,000 SNPs would be required to explain 80% of the population variation in height.” Common Genetic Variation and Human Traits, NEJM 360;17 So what do we miss? 1. Other variants like Copy Number Variations or epigenetics may play an important role 2. Interactions between genetic variants (GxG) or with the environment (GxE) 3. Many causal variants may be rare and/or poorly tagged by the measured SNPs 4. Many causal variants may have very small effect sizes Status: - Dec: submitted to PLoS Computational Biology (IF=6.2) (after positive reply to pre-submission inquiry) Status: accepted for publication in Nature (IF=31.4 ) Status: - Dec: submitted to PLoS Genetics (IF=8.7), currently under review Status: - submitted to Biostatistics (IF=3.4, 2nd best of 92 journals for Statistics & Probability) - Revision accounting for reviewers’ comments to be submitted soon Status: accepted for publication GASTROENTEROLOGY (IF=12.6). Status: submitted as application note to Bioinformatics (IF=4.32, 2nd best of 28 journals for Mathematical & Computational Biology) Status: manuscript ready for submission to PLoS Comp Biology Research Overview Large (genomic) systems Small systems • many uncharacterized • elements well-known elements • relationships unknown • computational analysis should: • many relationships established • aim at quantitative modeling of systems properties like: improve annotation Dynamics reveal relations Robustness reduce complexity Logics PAST Modeling Drosophila as model for Development Quantitative Experimental Study using Automated Image Processing a: mark anterior and posterior pole, first and last eve-stripe b: extract region around dorsal midline c: semi-automatic determination of stripes/boundaries Experimental Results: Positions • Bergmann S, Sandler S, Sberro H, Shnider S, Shilo B-Z, Schejter E and Barkai N Pre-Steady-State Decoding of the Bicoid Morphogen Gradient, PLoS Biology 5(2) (2007) e46. • Bergmann S, Tamari Z, Shilo B-Z, Schejter E and Barkai N Stability of the Bicoid Gradient? Cell 132 (2008) 15. The Canonical Model A bit of Theory… The morphogen density M(x,t) can be modeled by a differential equation (reaction diffusion equation): Change in concentration of the morphogen at position x, time t Diffusion D: diffusion const. Degradation α: decay rate Source Model including nuclear trapping M 2M N k n M n s0 ( x ) D 2 kn MB t x M n kn MB N k n M n t nuclear morphogen Mn(x,t) nuclear absorbtion nuclear emission free morphogen kn production s0 k-n M(x,t) diffusion D Nuclei density NB(x,t) PRESENT Modeling Precision is highest at mid-embryo 1xbcd 2xbcd 4xbcd Δ: Gt Δ: Kr □: Hb Similar trend in o: Eve direct measurements of Bcd noise by Gregor et al. (Cell 2007) Scaling is position-dependent! “hyper-scaling” at anterior pole Status: - May: submitted to Molecular Systems Biology (IF=12.2) - Aug: first resubmission after mostly positive reviews - Dec: second submission (informally) accepted subject to proper response with respect to minor issues Modeling the Drosophila wing disk • Partner in SystemsX.ch project WingX - PhD student: Aitana Morton Delachapelle - PostDoc: Sascha Dalessi • Image processing to obtain spatiotemporal measures of proteins • Modeling Dpp gradient formation with focus on scaling Modeling the plant growth • Partner in SystemsX.ch project PlantX - PostDoc: Micha Hersch - PostDoc: Tim Hohm • Image processing to obtain spatiotemporal measures of seedlings • Modeling shade avoidance behavior Future directions The challenge of many datasets: How to integrate all the information? Organisms ? – – – – – Biological Insight Genotypic (SNPs/CNVs) Epigenetic data Gene/protein expression Protein interactions Organismal data Data types Modular Approach for Integrative Analysis of Genotypes and Phenotypes Phenotypes Measurements Modular links Individuals SNPs/Haplotypes Genotypes Association of (average) module expression is often stronger than for any of its constituent genes Towards interactions: Network Approaches for Integrative Association Analysis Using knowledge on physical gene-interactions or pathways to prioritize the search for functional interactions Modeling: Cross-talk between Drosophila and Arabidopsis modeling Both systems are growing multi-cellular tissues: Modelers (in my group and within the two RTDs) may learn from each other and exchange tools Acknowledgements to my group People: Zoltán Kutalik Micha Hersch Aitana Morton Diana Marek Barbara Piasecka Bastian Peter Karen Kapur Alain Sewer* Toby Johnson* Armand Valsessia Gabor Csardi Sascha Dalessi Tim Hohm *alumni Funding: SystemsX.ch, SNSF, SIB, Cavaglieri, Leenaards, European FP http://serverdgm.unil.ch/bergmann Acknowledgements to my collaborators DGM: Jacqui Beckmann Roman Chrast Carlo Rivolta Uni Geneva: Stylianos Antonarakis Manolis Dermitzakis Jacques Schrenzel Weizmann: Naama Barkai Benny Shilo Orly Reiner CIG: Christian Fankhauser Sophie Martin Alexandre Reymond Mehdi Tafti Bernard Thorens Uni Bern: Cris Kuhlemeier Andri Rauch Richard Smith MRC Cambridge: Ruth Loos Nick Wareham UNIL/CHUV: Murielle Bochud Pierre-Yves Bochud Fabienne Maurer Marc Robinson-Rechavi Amalio Telenti Peter Vollenweider Gerard Weber EPFL: Dario Floreano Felix Naef Uni Basel: Markus Affolter Mihaela Zavolan ETH & Uni Zurich: Konrad Basler Ernst Hafen Matthias Heinemann Christian v. Mehring Markus Noll Eckart Zitzler Uni Minnesota: Judith Berman GSK: Vincent Moser Dawn Waterworth UCSD: Trey Ideker UCLA: John Novembre Teaching: Past and Present http://www2.unil.ch/cbg/index.php?title=Teaching Teaching: Future 1. How can we equip Biology students at UNIL with basic knowledge in Computational Biology? • more “hands on” training! • group projects • new Master 2. How can we educate proficient Computational Biologists? • New Master program jointly with SIB, UniGE? • Develop ties with EPFL? Integration: Past & Present Integration: Future How can UNIL/FBM strengthen its position in Computational Biology? 1. Networking! 2. Create new senior positions! Integration: Future How can UNIL/FBM strengthen its ties with the industry? Vincent Moser David Heard Ulrich Genick CBG Pierre Farmer Pietro Scalfaro Andreas Schupert Manuel Peitsch