Principles of Protein Structure

advertisement
Modeling Protein Function
MED260
Philip E. Bourne
Department of Pharmacology, UCSD
pbourne@ucsd.edu
http://www.sdsc.edu/pb
Slides on-line at:
http://www.sdsc.edu/pb/edu/med260/med260.ppt
MED260 Modeling Protein Function
- October 11, 2006
1
Agenda
• Why model protein function?
• Where does it fit as a technique in modern medical
research?
• The data deluge as a motivator
• The extent of what can be modeled
• Ontologies – establishing order from chaos
• Examples of what can be learnt
• Accuracy – a word of caution
MED260 Modeling Protein Function
- October 11, 2006
2
Why Model Protein Function
• The rate of discovery of new proteins far
outweighs our ability to functionally characterize
them
• Functional discovery of new proteins has
implications in:
–
–
–
–
Drug discovery
Biomarker identification
Understanding of biological processes
Identification of disease states and treatment regimes
Why model protein function?
MED260 Modeling Protein Function
- October 11, 2006
3
REPRESENTATIVE
DISCIPLINE
EXAMPLE
UNITS
Anatomy
MRI
Physiology
Heart
Cell Biology
Neuron
Proteomics
Genomics
Structure
Sequence
Medicinal
Chemistry
Protease
Inhibitor
Where does it fit as a technique
in modern medical research?
SCIENTIFIC RESEARCH
& DISCOVERY
Organisms
REPRESENTATIVE
TECHNOLOGY
Migratory
Sensors
Organs
Ventricular
Modeling
Cells
Electron
Microscopy
Macromolecules
Biopolymers
Atoms & Molecules
X-ray
Crystallography
Protein
Docking
REPRESENTATIVE
DISCIPLINE
EXAMPLE
UNITS
Anatomy
MRI
Physiology
Heart
Cell Biology
Proteomics
Genomics
Medicinal
Chemistry
SCIENTIFIC RESEARCH
& DISCOVERY
Organisms
Translational
Medicine
Neuron
Structure
Sequence
Protease
Inhibitor
Where does it fit as a technique
in modern medical research?
REPRESENTATIVE
TECHNOLOGY
Migratory
Sensors
Organs
Ventricular
Modeling
Cells
Electron
Microscopy
Macromolecules
Biopolymers
Atoms & Molecules
X-ray
Crystallography
Protein
Docking
The Ability to Model Protein Function
Influences and can be Influenced by Any
Level of Biological Complexity - Examples
• Genome - rapid increase in sequenced genomes provides
new raw material
• Proteome – large increase in the number of 3D structures
highlights new functions
• Interactome – identification of a binding partner points to
a new function
• Metabolome – isolation of a protein within a metabolic
pathway
• Cell - localization points to function
• Organ – gene expression in heart tissue points to function
• Organism – different physiology observed in species can
be related to protein functions
Where does it fit as a technique
in modern medical research?
MED260 Modeling Protein Function
- October 11, 2006
6
REPRESENTATIVE
DISCIPLINE
EXAMPLE
UNITS
Anatomy
MRI
Physiology
Heart
Cell Biology
Neuron
SCIENTIFIC RESEARCH
& DISCOVERY
Organisms
REPRESENTATIVE
TECHNOLOGY
Migratory
Sensors
Organs
Ventricular
Modeling
Cells
Electron
Microscopy
We will focus here
Proteomics
Genomics
Medicinal
Chemistry
Structure
Sequence
Protease
Inhibitor
Macromolecules
Biopolymers
Atoms & Molecules
MED260 Modeling Protein Function
- October 11, 2006
X-ray
Crystallography
Protein
Docking
7
At All Levels We Are Being Driven By Data
Biological Experiment
Collect
Data
Information
Characterize
Knowledge
Compare
Model
Discovery
Infer
Complexity
Higher-life
Technology
1
Organ
10
Brain
Mapping
102 Neuronal
Modeling
106
Virus
Structure
Ribosome
Human
Genome
Project
Yeast
E.Coli
C.Elegans
Genome Genome Genome
90
1
# People/Web Site
Genetic
Circuits
ESTs
Sequence
The Data Deluge
Virtual
Communities
Model Metaboloic
Pathway of E.coli
Sub-cellular
Structure
100000 Computing
Power
Cardiac
Modeling
Cellular
Assembly
Data
1000
100
Gene Chips
95
00
Year
1 Small
Genome/Mo.
Human
Genome
05
Sequencing
Technology
Metagenomics A First Look
• New type of genomics
• New data (and lots of it)
and new types of data
– 17M new (predicted
proteins!) 4-5 x growth
in just few months and
much more coming
– New challenges and
exacerbation of old
challenges
The Data Deluge
MED260 Modeling Protein Function
- October 11, 2006
9
Metagenomics: First Results
• More then 99.5% of DNA • Everything we touch
in very environment
turns out to be a gold
studied represent unknown
mine
organisms
• Environments studied:
– Culturable organisms are
exceptions, not the rule
• Most genes represent
distant homologs of known
genes, but there are
thousands of new families
The Data Deluge
– Water (ocean, lakes)
– Soil
– Human body (gut, oral
cavity, human
microbiome)
MED260 Modeling Protein Function
- October 11, 2006
10
Metagenomics New Discoveries
Environmental (red) vs. Currently Known PTPases (blue)
1
The Data Deluge
MED260 Modeling Protein Function
- October 11, 2006
11
The Good News and the Bad News
• Good news
– Data pointing towards function are growing at
near exponential rates
– IT can handle it on a per dollar basis
• Bad news
– Data are growing at near exponential rates
– Quality is highly variable
– Accurate functional annotation is sparse
The Data Deluge
MED260 Modeling Protein Function
- October 11, 2006
12
Genomes - 2004
• We all know about the human – what is not
so well known is:
–
–
–
–
–
The Data Deluge
191 completed microbial genomes
44 archaea
727 bacteria
785 eukaryotes (complete or in progress)
Viroids ….
MED260 Modeling Protein Function
- October 11, 2006
13
Proteome
• We are reasonably good at finding proteins
in genomes with intergenic regions but not
perfect – eg alternative initiation codons
• Regulatory elements provide a different set
of challenges
• We are not so good at assigning functions to
those proteins
• Moreover the devil is in the details
The Extent of What Can Be Modeled
MED260 Modeling Protein Function
- October 11, 2006
14
Estimated Functional Roles (by % of Proteins)
of the Proteome in a Complex Organism
The Extent of What Can Be Modeled
MED260 Modeling Protein Function
- October 11, 2006
15
Functional Nomenclature Needs to be Consistent
for Orderly Progress – Enter EC and GO
• EC classifies all enzymes http://www.chem.qmul.ac.uk/iubmb/enzyme
/
• Gene Ontology Consortium characterizes
by molecular function, biochemiscal
process and cellular location
http://www.geneontology.org/
Ontologies –
establishing order from chaos
MED260 Modeling Protein Function
- October 11, 2006
16
Functional
Coverage of the
Human Genome
40% covered
http://function.rcsb.org:8080/pdb/function_distribution/index.html
The Extent of What Can Be Modeled
Step 1. Learn What You Can from
the Protein Sequence
• Find it
• Pay attention to the quality of the functional
annotation – errors are transitive
• Understand its 1-D structure – domain
organization, {signatures, fingerprints}
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
18
Step 2. Is there a 3D Structure? If so
What Can You Learn from That?
•
•
•
•
Find it
Understand it
Characterize it
Understand its function(s) – these follow a
power law at the fold level – some folds are
promiscuous (many functions) others are
solitary or of unknown function
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
19
(a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA
(e) antibodies (f) viruses
(g) actin
(h) the nucleosome
(i) myosin
(j) ribosome
Courtesy of David Goodsell, TSRI
First Why Bother with Structure?
An Example: Protein Kinase A
This “molecular scene”
for cAMP dependant
protein kinase depicts
years of collective
knowledge.
Beyond basics, only
the atomic coordinates
are captured by the
PDB.
Functional annotation
requires the literature
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
21
What Did that Picture Tell Us?
• Two domains with
associated functions
• ATP binding & substrate
binding
• Through conserved
residues and their spatial
location details of the ATP
and substrate binding and
mechanism of the phospho
transfer reaction
Examples of what can be learnt
• So is structure
the answer to
functional
modeling?
MED260 Modeling Protein Function
- October 11, 2006
22
Question: So is structure the answer to
functional modeling?
Answer: Partly - The number of unique
protein sequences still outnumbers the
number of unique structures by 100:1
Enter Structural Genomics
Enter Structure Prediction
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
23
The Structural Genomics Pipeline
(X-ray Crystallography)
Basic Steps
Crystallomics
• Isolation,
Target • Expression,
Data
Selection • Purification, Collection
• Crystallization
Examples of what can be learnt
Structure Structure
Solution Refinement
MED260 Modeling Protein Function
- October 11, 2006
Functional
Annotation
Publish
24
Structural Genomics Will Give Us..
• Good news
– More structures (definitely)
– New folds (some but not as anticipated)
– New understanding of specific diseases and pathways
(maybe)
– Representatives from each major protein family
(maybe)
• Bad news
– Many new structures that are functionally unclassified
(definitely)
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
25
What About Structure Prediction?
• Current rule
We will be able to predict a structure when we
know all the structures 
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
26
Why is Structure Prediction so Hard?
Random 1000 structurally similar PDB polypeptide chains with z > 4.5
(% sequence identity vs alignment length)
Twilight Zone
Midnight Zone
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
27
Approaches to Structure Prediction
•
•
•
•
•
Homology modeling
Threading (aka fold recognition)
Ab initio
How well do we do? – see CASP
Consensus servers
– Eva - http://cubic.bioc.columbia.edu/eva/
– LiveBench - http://bioinfo.pl/meta/
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
28
Step 3. What Can Be Got from Structure
When You Have it?
From Structural Bioinformatics
Ed Bourne and Weissig p394 Wiley 2002
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
29
Specific Example
• Mj0577 – putative ATP molecular switch
Mj0577 is an open reading frame (ORF) of previously unknown function
from Methanococcus jannaschii. Its structure was determined at 1.7Å
(Figure 7a) (Zarembinski et al, 1998). The structure contains a bound
ATP molecule, picked up from the E. coli host. The presence of bound
ATP led to the proposition that Mj0577 is either an ATPase, or an ATPbinding molecular switch. Further experimental work showed that
Mj0577 cannot hydrolyse ATP by itself, and can only do so in the
presence of M. jannaschii crude cell extract. Therefore it is more
likely to act as a molecular switch, in a process analogous to ras-GTP
hydrolysis in the presence of GTPase activating protein.
From Structural Bioinformatics
Ed Bourne and Weissig p402 Wiley 2002
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
30
Step 4. Proteins Do Not Function in Isolation
But are Part of Complex Interaction Networks
http://www.genome.jp/kegg/
Examples of what can be learnt
MED260 Modeling Protein Function
- October 11, 2006
31
Accuracy - A Word of Caution
• Errors are transitive
– Proteins A and B are observed to have similar
functions through sequence homology
– Proteins B and C are observed to have similar
functions through sequence homology
– Is protein A related to protein C?
– Up to 30% of current annotation may be wrong
Accuracy - A Word of Caution
MED260 Modeling Protein Function
- October 11, 2006
32
Questions?
MED260 Modeling Protein Function
- October 11, 2006
33
Demo of Steps 1-4
• Step 1. Learn What You Can from the Protein
Sequence
• Step 2. Is there a 3D Structure? If So, What Can
You Learn from That?
• Step 3. What Can Be Got from Structure When
You Have it?
• Step 4. Proteins Do Not Function in Isolation But
are Part of Complex Interaction Networks
MED260 Modeling Protein Function
- October 11, 2006
34
Download