slides

advertisement
Course Review
• Name one important thing that you learnt
from this course that you feel will be
important to your research career
• Name one aspect you were hoping to
learn that you did not
Pharm 201 Lecture 19, 2011
1
Some Thoughts on the
Future of Biological Data
with Emphasis on Structural
Bioinformatics
Philip E. Bourne
Dept. of Pharmacology
University of California San Diego
pbourne@ucsd.edu
Pharm 201 Lecture 19, 2011
2
Agenda
• What is structural genomics and what is its impact?
• Unsolved problems in structural bioinformatics
• New challenges related to structural bioinformatics
• The bigger picture
• The final
Pharm 201 Lecture 19, 2011
3
Structural Genomics:
A Broad Working Definition
Structural genomics is the process of highthroughput determination of the 3dimensional structures of biological
macromolecules
Pharm 201 Lecture 19, 2011
4
SG - What is the Goal?
• The goal of the human genome project was clear cut..
The goal of structural genomics is not so clear cut
• Phase I..
– Provision of enough structural templates to facilitate
homology modeling of most proteins
– Structures of all proteins in a complete proteome
– Structural elucidation of a complete biological
pathway
– Structural elucidation of a complete disease
Pharm 201 Lecture 19, 2011
5
Example Goals (Phase I)
1257
“The hyperthermophilic bacterium
Thermotoga maritima has been the
target of choice for pipeline
development and genome-wide fold
coverage.“
“The SGPP consortium will determine and analyze the
three-dimensional structures of a large number of proteins
from major global pathogenic protozoa, Leishmania major,
Trypanosoma brucei, Trypanosoma cruzi and Plasmodium
falciparum. “
70
“It is aimed at determining
structures of proteins and protein
complexes directly relevant to
human health and diseases. “
117
Pharm 201 Lecture 19, 2011
Structural Genomics of
Pathogenic Protozoa
6
Growth in the Number of New Topologies
per Year According To CATH
SG Had Very Little Direct Impact on New Folds and Hence Homology Modeling
New Folds
Total Folds
http://www.rcsb.org/pdb/statistics/contentG
rowthChart.do?content=fold-cath
Pharm 201 Lecture 19, 2011
from Nov., 2011
7
SG - What is the Goal? – Phase II
Pharm 201 Lecture 19, 2011
8
SG – Phase III – PSI-Biology
• The third phase of the PSI is called
PSI:Biology and is intended to reflect the
emphasis on the biological relevance of
the work
http://en.wikipedia.org/wiki/Protein_Structure_Initiative
Pharm 201 Lecture 19, 2011
9
Implications of Phase III SG
• Less single domains more complex
structures
• More p-p complexes
• More protein-ligand complexes
• More membrane proteins
• Better models
• More hybrid structures
• More molecular machines
Pharm 201 Lecture 19, 2011
10
SG Accounts for 14% of Structures
Pharm 201 Lecture 19, 2011
From RCSB PDB Nov 2011
11
Agenda
• What is structural genomics and what is its impact?
• Unsolved problems in structural bioinformatics
• New challenges related to structural bioinformatics
• The bigger picture
• The final
Pharm 201 Lecture 19, 2011
12
Crude Estimators of What We Know
and How We Might Get Better Basics
• Data accessibility
(60%)
• Domain definitions
(80%)
• Structure comparison
(80%)
• Disorder predictors
(70%)
• Structure
classification (80%)
•
Need more computer accessible
information on function etc.
•
Need fresh approaches
•
Need a better understanding of
the role of protein disorder period
•
More quantitative approaches
Pharm 201 Lecture 19, 2011
13
Crude Estimators of What We Know
and How We Might Get Better
• Basic knowledge of
macromolecular structure
(50%)
• PPI’s Protein-ligand
interactions ligand view (30%)
• Missing temporal view,
alternative views
• Integrated view of structure as
part of a biological continuum
of data and associated
knowledge (30%)
• Need better quantification
• Structure prediction from
sequence (40%)
• Need more structures
• Missing robust rules for
molecular recognition
Pharm 201 Lecture 19, 2011
14
Crude Estimators of What We Know
and How We Might Get Better
• Inferring function from
structure (40%)
• Macromolecular
assemblies (40%)
• A combination of
improvements
• Hybrid methods
• Docking (30%)
• Rational drug discovery
(10%)
• Better scoring, flexible
docking, allostery
• Polypharmacology,
network pharmacology
• Evolution (10%)
• Accurate proteome coverage
Pharm 201 Lecture 19, 2011
15
Example 0f What Could be Done in Evolution: Structural Domains and the Tree of Life
http://itol.embl.de/
Natalie Dawson
Unpublished
Pharm 201 Lecture 19, 2011
16
Example 0f What Could be Done in Evolution: Structural Domains and the Tree of Life
Pharm 201 Lecture 19, 2011
17
Example: Structural Mapping and Subsequent Insights from All Biochemical Path
Pharm 201 Lecture 19, 2011
18
Example: Better
Understanding of Drug
Receptor Interactions
• Tykerb – Breast cancer
• Gleevac – Leukemia, GI cancers
• Nexavar – Kidney and liver cancer
• Staurosporine – natural product –
alkaloid – uses many e.g.,
antifungal antihypertensive
19
Collins and Workman 2006 Nature Chemical Biology 2 689-700
Agenda
• What is structural genomics and what is its impact?
• Unsolved problems in structural bioinformatics
• New challenges related to structural bioinformatics
• The bigger picture
• The final
Pharm 201 Lecture 19, 2011
20
New Challenges
• Effective use of structural information in
systems biology – eg structural ppis
• Bridging the biological scales in an easily
understood way
• New ways of visualizing and hence
thinking about proteins
• Protein design/engineering
Pharm 201 Lecture 19, 2011
21
Agenda
• What is structural genomics and what is its impact?
• Unsolved problems in structural bioinformatics
• New challenges related to structural bioinformatics
• The bigger picture
• The final
Pharm 201 Lecture 19, 2011
22
The Bigger Picture - Numbers
On the Future of Genomic Data
Science 11 February 2011:
vol. 331 no. 6018 728-729
Pharm 201 Lecture 19, 2011
23
The Bigger Picture – Accuracy
Functional Misannotation
PLoS Comput Biol 2009
5(12): e1000605.
Pharm 201 Lecture 19, 2011
24
The Bigger Picture – Data
Culture
•
•
•
•
•
•
Data are not available
Data are undervalued
Data are stovepiped
This is a long tail of data which are lost
Institutional repositories are roach motels
Data repositories will go like journals
Pharm 201 Lecture 19, 2011
25
Beyond Data What is Wrong
Today?
Pharm 201 Lecture 19, 2011
26
What is Wrong Today?
• Formal science communication:
– Occurs too slowly
– Reaches too few people
– Costs too much
– Ignores the data
– Is very hard to reproduce
• Is stuck in the era of the printing press –
we need to move Beyond the PDF and
use the power of the medium
https://sites.google.com/site/beyondthepdf/
http://www.force11.org
The Research Enterprise
Methods
Data
Literature
The Current Reality
http://www.flickr.com/photos/51282757@N05/5585299226/lightbox/
Data
Database
Knowledge
Knowledgebase
Data Only
Wikis
Datapacks
Journals
Annotation
Data +
Annotation
Data + Some
Annotation
Data + Some
Annotation
+
Some
Integration
Pharm 201 Lecture 19, 2011
PLoS
iStructure
30
The Knowledge and Data Cycle
0. Full text of PLoS papers stored
in a database
4. The composite view has
links to pertinent blocks
of literature text and back to the PDB
4.
1.
1. A link brings up figures
from the paper
3. A composite view of
journal and database
content results
3.
2.
2. Clicking the paper figure retrieves
data from the PDB which is
analyzed
My Dream
1. User reads a paper
(one view of the info)
2. Clicks on a figure
which can be analyzed
3. Clicking the figure
gives a composite
database + journal
view
4. This takes you to yet
more papers or
databases
Methods
Data
It Goes Beyond Data
Literature
• Its hard and embarrassing to reproduce your
own work
• We have a working prototype using Wings
• I can feel the potential productivity gains
• My students are more doubtful
• Its been a lot of fun and will enable us to
improve our processes regardless of the
workflow system itself
Methods
Literature
Data
Yes The Workflow is Real
Methods
Data
Problems with Publishing
Workflows
Literature
•
•
•
•
•
•
•
•
Workflows are not linear
Workflow : paper is not 1:1
Confidentiality
Peer review
Infrastructure
Community acceptance
Reward system
No publisher seems willing to touch them
Pharm 201 Lecture 19, 2011
35
Agenda
• What is structural genomics and what is its impact?
• Unsolved problems in structural bioinformatics
• New challenges related to structural bioinformatics
• The bigger picture
• The final
Pharm 201 Lecture 19, 2011
36
The Final
• Prepare a mini-grant research proposal
with the following ingredients:
– Background and Significance
– Preliminary Results
– Proposed Research and Methods
– Expected Outcomes
• The theme is any aspect of the course
where you would like to contribute new
research ideas and potential outcomes
Pharm 201 Lecture 19, 2011
37
The Final
• Points (50) will be awarded for:
– B&S – literature coverage, justification of the originality and
potential importance of the contribution (20)
– Pre Res – anything you can actually accomplish to support the
proposal eg pseudocode, computations using existing tools, etc.
(15)
– Proposed Research – the credibility and rigor of what you
propose (10)
– Expected Outcomes (5)
• There is no length requirement but I would anticipate ~10, 12pt
single space pages to do the topic justice
• This should not relate to one of your previous assignments
• Feel free to email me to discuss ideas before starting
Pharm 201 Lecture 19, 2011
38
Download