Course Review • Name one important thing that you learnt from this course that you feel will be important to your research career • Name one aspect you were hoping to learn that you did not Pharm 201 Lecture 19, 2011 1 Some Thoughts on the Future of Biological Data with Emphasis on Structural Bioinformatics Philip E. Bourne Dept. of Pharmacology University of California San Diego pbourne@ucsd.edu Pharm 201 Lecture 19, 2011 2 Agenda • What is structural genomics and what is its impact? • Unsolved problems in structural bioinformatics • New challenges related to structural bioinformatics • The bigger picture • The final Pharm 201 Lecture 19, 2011 3 Structural Genomics: A Broad Working Definition Structural genomics is the process of highthroughput determination of the 3dimensional structures of biological macromolecules Pharm 201 Lecture 19, 2011 4 SG - What is the Goal? • The goal of the human genome project was clear cut.. The goal of structural genomics is not so clear cut • Phase I.. – Provision of enough structural templates to facilitate homology modeling of most proteins – Structures of all proteins in a complete proteome – Structural elucidation of a complete biological pathway – Structural elucidation of a complete disease Pharm 201 Lecture 19, 2011 5 Example Goals (Phase I) 1257 “The hyperthermophilic bacterium Thermotoga maritima has been the target of choice for pipeline development and genome-wide fold coverage.“ “The SGPP consortium will determine and analyze the three-dimensional structures of a large number of proteins from major global pathogenic protozoa, Leishmania major, Trypanosoma brucei, Trypanosoma cruzi and Plasmodium falciparum. “ 70 “It is aimed at determining structures of proteins and protein complexes directly relevant to human health and diseases. “ 117 Pharm 201 Lecture 19, 2011 Structural Genomics of Pathogenic Protozoa 6 Growth in the Number of New Topologies per Year According To CATH SG Had Very Little Direct Impact on New Folds and Hence Homology Modeling New Folds Total Folds http://www.rcsb.org/pdb/statistics/contentG rowthChart.do?content=fold-cath Pharm 201 Lecture 19, 2011 from Nov., 2011 7 SG - What is the Goal? – Phase II Pharm 201 Lecture 19, 2011 8 SG – Phase III – PSI-Biology • The third phase of the PSI is called PSI:Biology and is intended to reflect the emphasis on the biological relevance of the work http://en.wikipedia.org/wiki/Protein_Structure_Initiative Pharm 201 Lecture 19, 2011 9 Implications of Phase III SG • Less single domains more complex structures • More p-p complexes • More protein-ligand complexes • More membrane proteins • Better models • More hybrid structures • More molecular machines Pharm 201 Lecture 19, 2011 10 SG Accounts for 14% of Structures Pharm 201 Lecture 19, 2011 From RCSB PDB Nov 2011 11 Agenda • What is structural genomics and what is its impact? • Unsolved problems in structural bioinformatics • New challenges related to structural bioinformatics • The bigger picture • The final Pharm 201 Lecture 19, 2011 12 Crude Estimators of What We Know and How We Might Get Better Basics • Data accessibility (60%) • Domain definitions (80%) • Structure comparison (80%) • Disorder predictors (70%) • Structure classification (80%) • Need more computer accessible information on function etc. • Need fresh approaches • Need a better understanding of the role of protein disorder period • More quantitative approaches Pharm 201 Lecture 19, 2011 13 Crude Estimators of What We Know and How We Might Get Better • Basic knowledge of macromolecular structure (50%) • PPI’s Protein-ligand interactions ligand view (30%) • Missing temporal view, alternative views • Integrated view of structure as part of a biological continuum of data and associated knowledge (30%) • Need better quantification • Structure prediction from sequence (40%) • Need more structures • Missing robust rules for molecular recognition Pharm 201 Lecture 19, 2011 14 Crude Estimators of What We Know and How We Might Get Better • Inferring function from structure (40%) • Macromolecular assemblies (40%) • A combination of improvements • Hybrid methods • Docking (30%) • Rational drug discovery (10%) • Better scoring, flexible docking, allostery • Polypharmacology, network pharmacology • Evolution (10%) • Accurate proteome coverage Pharm 201 Lecture 19, 2011 15 Example 0f What Could be Done in Evolution: Structural Domains and the Tree of Life http://itol.embl.de/ Natalie Dawson Unpublished Pharm 201 Lecture 19, 2011 16 Example 0f What Could be Done in Evolution: Structural Domains and the Tree of Life Pharm 201 Lecture 19, 2011 17 Example: Structural Mapping and Subsequent Insights from All Biochemical Path Pharm 201 Lecture 19, 2011 18 Example: Better Understanding of Drug Receptor Interactions • Tykerb – Breast cancer • Gleevac – Leukemia, GI cancers • Nexavar – Kidney and liver cancer • Staurosporine – natural product – alkaloid – uses many e.g., antifungal antihypertensive 19 Collins and Workman 2006 Nature Chemical Biology 2 689-700 Agenda • What is structural genomics and what is its impact? • Unsolved problems in structural bioinformatics • New challenges related to structural bioinformatics • The bigger picture • The final Pharm 201 Lecture 19, 2011 20 New Challenges • Effective use of structural information in systems biology – eg structural ppis • Bridging the biological scales in an easily understood way • New ways of visualizing and hence thinking about proteins • Protein design/engineering Pharm 201 Lecture 19, 2011 21 Agenda • What is structural genomics and what is its impact? • Unsolved problems in structural bioinformatics • New challenges related to structural bioinformatics • The bigger picture • The final Pharm 201 Lecture 19, 2011 22 The Bigger Picture - Numbers On the Future of Genomic Data Science 11 February 2011: vol. 331 no. 6018 728-729 Pharm 201 Lecture 19, 2011 23 The Bigger Picture – Accuracy Functional Misannotation PLoS Comput Biol 2009 5(12): e1000605. Pharm 201 Lecture 19, 2011 24 The Bigger Picture – Data Culture • • • • • • Data are not available Data are undervalued Data are stovepiped This is a long tail of data which are lost Institutional repositories are roach motels Data repositories will go like journals Pharm 201 Lecture 19, 2011 25 Beyond Data What is Wrong Today? Pharm 201 Lecture 19, 2011 26 What is Wrong Today? • Formal science communication: – Occurs too slowly – Reaches too few people – Costs too much – Ignores the data – Is very hard to reproduce • Is stuck in the era of the printing press – we need to move Beyond the PDF and use the power of the medium https://sites.google.com/site/beyondthepdf/ http://www.force11.org The Research Enterprise Methods Data Literature The Current Reality http://www.flickr.com/photos/51282757@N05/5585299226/lightbox/ Data Database Knowledge Knowledgebase Data Only Wikis Datapacks Journals Annotation Data + Annotation Data + Some Annotation Data + Some Annotation + Some Integration Pharm 201 Lecture 19, 2011 PLoS iStructure 30 The Knowledge and Data Cycle 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB 4. 1. 1. A link brings up figures from the paper 3. A composite view of journal and database content results 3. 2. 2. Clicking the paper figure retrieves data from the PDB which is analyzed My Dream 1. User reads a paper (one view of the info) 2. Clicks on a figure which can be analyzed 3. Clicking the figure gives a composite database + journal view 4. This takes you to yet more papers or databases Methods Data It Goes Beyond Data Literature • Its hard and embarrassing to reproduce your own work • We have a working prototype using Wings • I can feel the potential productivity gains • My students are more doubtful • Its been a lot of fun and will enable us to improve our processes regardless of the workflow system itself Methods Literature Data Yes The Workflow is Real Methods Data Problems with Publishing Workflows Literature • • • • • • • • Workflows are not linear Workflow : paper is not 1:1 Confidentiality Peer review Infrastructure Community acceptance Reward system No publisher seems willing to touch them Pharm 201 Lecture 19, 2011 35 Agenda • What is structural genomics and what is its impact? • Unsolved problems in structural bioinformatics • New challenges related to structural bioinformatics • The bigger picture • The final Pharm 201 Lecture 19, 2011 36 The Final • Prepare a mini-grant research proposal with the following ingredients: – Background and Significance – Preliminary Results – Proposed Research and Methods – Expected Outcomes • The theme is any aspect of the course where you would like to contribute new research ideas and potential outcomes Pharm 201 Lecture 19, 2011 37 The Final • Points (50) will be awarded for: – B&S – literature coverage, justification of the originality and potential importance of the contribution (20) – Pre Res – anything you can actually accomplish to support the proposal eg pseudocode, computations using existing tools, etc. (15) – Proposed Research – the credibility and rigor of what you propose (10) – Expected Outcomes (5) • There is no length requirement but I would anticipate ~10, 12pt single space pages to do the topic justice • This should not relate to one of your previous assignments • Feel free to email me to discuss ideas before starting Pharm 201 Lecture 19, 2011 38