Professional Development Lecture 1 Research: The Big Picture

advertisement
Feb 1, 2008 Professional Development
Series
1
Introduction
Chemical speciation modeling shows that Fe, Zn, Mn, and Co concentrations in an
Archaean anoxic ocean, a Proterozoic euxinic ocean, and a Modern oxic ocean would have
been quite different (Fig. 1). R.J.P. Williams and J.J.R. Frausto da Silva have long
contended that these changes have had an indelible effect upon the evolution of life,
particularly in the selection of elements for biological usage. Their theories further posit
that this selective force will have left imprints in the genomes of organisms, though this
has not been tested. Here we present the metal-binding structural contents of modern
proteomes, as they are inferred from bioinformatics analysis of fully sequenced genomes.
These results are reconciled with the theorized changes in global trace metal geochemistry.
Bacteria
Archaea
Modern proteomes and putative “metallomic” imprints
of ancient changes in geochemistry
Christopher L. Dupont1, Song Yang2, Brian Palenik1, Philip E. Bourne3
1.Scripps Institution of Oceanography, University of California, San Diego
2.Department of Chemistry and Biochemistry, University of California, San Diego
3. San Diego Supercomputer Center and the Department of Pharmacology, University of California, San Diego
Contact: cdupont@ucsd.edu
Eukarya
Fe binding folds: Oxygen and redox shifts
The abundance of metal binding structures in a proteome adheres to a power law
1
Oxygen
1.00E-16
1.00E-20
1.00E-06
Iron
1.00E-09
1.00E-12
1.00E-15
1.00E-07
Cobalt
Manganese
1.00E-09
1.00E-11
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
Billions of years before present
2
A
Archaea
102.5
Figure 1: Theoretical levels of trace metals and oxygen in the deep ocean through Earth’s
history. Whether the deep ocean became oxic or euxinic following the rise in atmospheric
oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic
ocean-dashed lines). The trace metal concentrations are replotted from Saito et al, 2003.
The phylogenetic tree symbols at the top of the figure show the theoretical periods of
diversification for each Superkingdom.
Methodology: Making the metallome
Eukarya
B
Total domains in a proteome
105
Bacteria
Eukarya
Archaea
1
Bacteria
2. Proteome Sequence
(amino acid)
Zn
Fe
Mn
Co
Figure 3: Panel A: Power law scaling for the abundance of metal binding domains. Each point is a discrete proteome of an Archaea (■), Bacteria
(+), or Eukarya (o), with the number of Zn binding proteins on the Y-axis plotted against the total number of structural domains in a proteome,
which is linear with genome size. Panel B: The slopes of the fitted power laws for Zn, Fe, Mn, and Co for each Superkingdom, which are
evolutionary constants of proteome evolution (see below).
genome growth, while a slope > 1 indicates that the group of domains is being preferentially duplicated (or retained in the case of
genome reductions).
The number of metal binding structural domains (n m) in a proteome of size p at any given time (t) are described by the generic equation:
nm (t) = (nm (0) / p(0)<a
>/<a>
m
) p(t)<a
>/<a>
m
where <am> and <a> are time averages of the growth of a category and the entire proteome, respectively.
3. HMM-based classification
into structural fold families
Fe
heme bound
oxidative defense
4. A “metallome” for each
proteome is constructed using a
manually curated annotation of
the SCOP database. Includes
structural and functional
information
Fe
His bound
Zn
vitamin
His bound
metabolism
carbon assimilation
Figure 2: Pathway of metallome construction. The results of steps 1, 2, and 3 are
contained in the Superfamily database Step 4 is done using a manual annotation from the
SCOP database.
100
90
80
70
60
50
40
30
20
10
0
14
12
10
8
6
4
2
0
Unique Fe-binding fold families
(108 total)
(♦)Average copy number
(x) Percent of Bacterial proteomes
which a fold family occurs in
Metallomes are very diverse
Figure 3: A quantile plot showing
the percent of Bacterial proteomes
each Fe-binding fold family occurs
in (x). This plot also shows the
average copy number of that fold
family in the proteomes where it
occurs (♦). Essentially, few Febinding folds are in most
proteomes. Further, the widespread
Fe-binding folds are not necessarily
abundant. Similar trends are
observed for Zn, Mn, and Co in all
three Superkingdoms.
What does this mean?
1. The first term (blue) is defined by a common ancestor (time zero), and thus is the same for all proteomes in a given
Superkingdom
2. The second term is the slope of our observed power laws, indicating that the abundances of Zn, Fe, Mn, and Co binding
domains conform to Superkingdom-specific evolutionary constants, regardless of the evolutionary history of the
organism.
Therefore:
1.The proteomes of the Prokarya have preferentially retained or recruited Fe and Co binding
domains during increases or decreases in proteome size, respectively, while excluding Zn
binding domains
2.Visa versa in the proteomes of Eukarya
Why are are the power laws different for each Superkingdom?
Power laws are likely influenced by selective pressure. Qualitatively, the differences in the power law slopes describing
Eukarya and Prokarya are similar to the shifts in trace metal geochemistry that occur with the rise in oceanic oxygen
We hypothesize that they are the result of the environment of the last common ancestor in each Superkingdom
This suggests that Eukarya evolved in an oxic environment, whereas the Prokarya evolved in anoxic enironmonts
Do the metallomes contain further support this hypothesis?
References and Acknowledgements
Ubiquitous metal binding folds? Very few folds are found in all or most (>90%)
proteomes. These include the tRNA synthases (Zn), Enolases (Mn), HemN (O2
independent coproporphyrin oxigenase), and HighPotentialIronProteins (HIPIP)
%
0.44 + 0.48
0.13 + 0.3
0.12 + 0.09
0.11 + 0.08
0.07 + 0.1
0.07 + 0.04
0.06 + 0.01
1.80 + 0.7
1.60 + 0.3
1.10 + 1.0
0.80 + 0.20
0.60 + 0.16
0.55 + 0.1
0.5 + 0.1
0.38 + 0.25
0.3 + 0.4
0.21 + 0.15
0.2 + 0.15
0.2 + 0.2
0.14 + 0.2
0.12 + 0.09
Fe-binding
heme
heme
heme
amino
amino
amino
heme
Fe-S
Fe-S
heme
Fe-S
Fe-S
Fe-S
amino
Fe-S
heme
Fe-S
Fe-S
Fe-S
heme
Fe-S
O2
yes
no
no
no
yes
yes
no
no
no
no
1
no
no
2
no
1
no
no
no
no
no
Overall percent of Fe bound by
Fe-S
heme
amino
21 + 9
47 + 19
32 + 12
68 + 12
13 + 14
19 + 6
47 + 11
22 + 12
31 + 16
0
Power Laws: fundamental constants in the evolution of proteomes
The power law is described by the function y = mxb. A slope of 1 indicates that a group of structural domains is in equilibrium with
1. Genome Sequence
(actg)
Fold Family
Cytochrome P450
Cytochrome c3-like
Cytochrome b5
Purple acid phosphatase
Penicillin synthase-like
Hypoxia-inducible factor
Di-heme elbow motif
4Fe-4S ferredoxins
MoCo biosynthesis proteins
Heme-binding PAS domain
HemN
a helical ferrodoxin
biotin synthase
ROO N-terminal domain-like
High potential iron protein
Heme-binding PAS domain
MoCo biosynthesis proteins
HemN
4Fe-4S ferredoxins
cytochrome c
a helical ferrodoxin
1. Some, but not all, PAS domains actually sense oxygen
2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway
Feb 1, 2008 Professional Development
Series
1. Any work by JJR Frausto Da Silva and RJP Williams, Saito et al. 2003 Inorganica Chimica Acta 356: 308-318. Anbar and Knoll 2002 Science 297: 1137-1142, Van Nimwegen in
Koonin et al. Power Laws. C.L.D. would like to thank the Princeton Center for Environmental Bioinorganic Chemistry and the ASEE (NDSEG fellowship) for funding; PEB is
funded by NIH.
Table 1: The seven most abundant Fe binding folds in each Superkingdom, along with the mode of
Fe binding. Also shown is if O2 is present in reactions catalyzed by that fold. Essentially,
Eukaryotic Fe binding folds are more likely to bind Fe by hemes or amino acids and also show an
increased usage of oxygen.
This is consistent with the hypothesis that Eukarya evolved in an oxic environment.
The importance of “small class” Zn folds to Eukarya
10000
Total “small class” Zn
binding domains
1.00E-12
Slope of fitted power law
1.00E-08
Zinc
Superkingdom
Total Zn-binding domains in a proteome
10
10 4
0
Concentration
(O2 in arbitrary units, Zn and Fe in moles L-1
0.5
B
A
Eukarya
30/53
18/28
1000
5/53
0/28
100
Bacteria
0/53
0/28
10
7/53
0/28
11/53
9/28
0/53
0/28
Archaea
0/53
1/28
1
100
1000
10000
100000
Total number of domains
in a proteomes
Figure 5: A: Log-log plot of the abundance of “small class” Zn binding folds in the proteomes for
each Superkingdom. B: Venn diagram showing the distribution of the 53 unique small class Zn
folds in each Superkingdom. The bottom set of numbers describe the distribution of small class
Zn folds that occur in at least 50% of the proteomes in a given Superkingdom.
Small class Zn folds are exemplified by Zn fingers and RING domains. They are believed to have
originally evolved in Archaea.
It seems unlikely that the observed diversification of Zn structures could occur in an
environment low in Zn (Fig. 1).
Potential methodological biases
1. Unknown folds: The results from the Protein Structure Initiative suggest
that there will be few novel metalloproteins of widespread distribution and
high abundance
2. Genome Bias: Principal component analysis shows oxygen tolerance and
environment have little effect upon the trends observed in Fig. 4.
Phylogeny groupings are apparent however.
Conclusions
1.Metallomes have diverse compositions, yet the total abundances
conform to evolutionary constants
2.These constants exhibit Superkingdom-specific differences
consistent with ancient changes in geochemistry, a hypothesis
further supported by the roles of Zn and Fe
3. These results provide genomic-based evidence for the theory of
Anbar and Knoll that Eukaryotic diversification and oxygenrelated changes in trace metal chemistry are linked
2
• Lei Xie, PhD
• Researcher
Repositioning Existing
Pharmaceuticals
Our laboratory is very interested in scientific dissemination in
the Web 2.0 era. To this end we have two major projects.
Scientific Dissemination and
Communication
The PDB contains a significant number of major pharmaceuticals bound
to their receptors. Lei Xie with Sarah Kinnings and Jian Wang, have
developed a methodology for finding equivalent binding sites across
what we define as the druggable proteome. At this time, we estimate
this covers about 40% of all druggable targets. An equivalent binding
site for a major pharmaceutical holds promise for either (a) explaining
the side effects of existing drugs, or (b) using an existing drug (already
approved) to treat a different condition. Thus far we have one example
of each.
(1) BioLit is the work of Dr. Lynn Fink and involves the
integration of biological database content with the
biological literature. We are using the complete corpus of
the Public Library of Science journals (PLoS;
www.plos.org) and the Protein Data Bank (PDB;
www.pdb.org) as our prototype system. So for example, if
you access a PLoS paper online describing a structurefunction relationship, you can click on a figure in the paper
and by accessing the associated structural data in the PDB
bring up a view of the molecule that maps directly to that
presented in the paper, rotate it, annotate it, and use it to
further query the PDB and the associated literature.
(2) SciVee is led by Apryl Bailey and involves Lynn Fink,
John Matherly, Alex Ramos, Willy Suwanto and Ben
Wilson. We refer to it as a YouTube for scientists. Check
it out at http://scivee.tv
• Kristine Briedis
• Iowa State University, B.S. Genetics
• Bioinformatics Graduate Program
• 6th year PhD student
Using Structure Similarity to Search for
New Human Protein Kinases
IGAP by EOL, an integrative annotation pipeline
This project utilizes the EOL pipeline to
identify new human kinases with its
automated annotation tool, iGAP. In
addition to traditional sequence alignment,
the more conserved structural elements are
considered when searching for remote
homologs. This is achieved by comparing
proteins to a comprehensive fold library to
predict function and structure.
PDB
FoldLib
SCOP
PDP
WU-BLAST
Selective Estrogen Receptor Modulators (SERMs) are a class of drugs
that include tamoxifen which are used in the treatment of breast cancer.
This drug has significant side effects attributed to disruption in calcium
homeostasis. We believe we have found the target of this epidemilogy,
namely a Sacroplasmic Reticulum Ca2+ ion channel ATPase protein
(SERCA). The challenge now is to design a modified SERM that has
equal or better binding to estrogen receptors but less binding to
SERCA. In a second experiment, we have established a Parkinson’s
Disease drug which we believe will be very effective in the treatment of
drug resistant tuberculosis.
123D
PSI-Blast
The Bourne Laboratory
http://www.sdsc.edu/pb
Proteome-wide Elucidation of the Molecular Mechanism Defining the Adverse Effect of Selective
Esterogen Receptor Modulators.
L. Xie and P.E. Bourne 2007 PLoS Comp. Biol., Submitted.
Genome-wide Study of the Evolution of
Protein Domains
Phylogeny Determined by Protein Domain Content.
S. Yang, R.F. Doolittle, and P.E. Bourne. 2005 PNAS 102: 373-378
Reliability
scoring
Analysis of the Human Kinome Using Methods Including Fold
Recognition Reveals Two Novel Kinases
K.M. Briedis and P.E. Bourne PLoS ONE, Submitted.
Our laboratory works in the general area of bioinformatics, with an emphasis on structural
bioinformatics – the use of the complete corpus of macromolecular structure – proteins, DNA, RNA
and complexes thereof to further our understanding of living systems. We believe that when studying
living systems the devil is in the details, and in many cases structure affords those details.
Our raw data are the Protein Data Bank (PDB) which we maintain for the worldwide community and is
used by 10,000 scientists every day. Using these data we develop algorithms and methods in an
attempt to improve our understanding of biology through computation. Here you will find the work of
some of our students who study, for example, species differentiation based on protein fold content,
prediction of sites of protein-protein interaction, prediction of binding sites across the druggable
proteome, and the discovery of novel protein kinases within the human genome. We are committed to
the free distribution of software and to open access to all our findings.
• Ruben Valas
• Carnegie Mellon, BS Computer Science 2005
• Bioinformatics Graduate Program
• 3rd year PhD student
Rethinking proteasome evolution:
Two novel bacterial proteasomes
The proteasome is a multi subunit structure that degrades
proteins. Protein degradation is an essential component of
regulation because proteins can become misfolded,
damaged, or unnecessary. Proteasomes and their
homologs vary greatly in complexity.
I am interested in the evolutionary
aspect of protein structures. Protein
domain, the basic three-dimensional
structural element of proteins, is
stabilized by its intrinsic physical and
chemical properties. Each domain has
its own specific functions and occupies
a particular sequence space thus
resulting in its own evolutionary history.
The study of the evolution of protein
domains is not only an interesting topic,
but further enhances our understanding
of the sequence-structure-function
relationship of proteins. Utilizing protein
domains to address evolutionary
problems and to study the evolution of
protein domains themselves are two
facets of the topic I am working on. The
right hand side figure is a phylogenetic
tree of 174 species across all three
major kingdoms generated using protein
domain content.
Prediction
of structural
components
A novel protein kinase function for an AcylCoA dehydrogenase protein has been
discovered with this process. This is
potentially significant because kinases
have been implicated in many diseases,
including some forms of cancer, thus
providing a new pharmaceutical target for
therapy. We are interested in
collaborations to further explore the role of
this putative kinase. Email
kbriedis@ucsd.edu for more information.
This work is supported by NIH GM63208.
Repurposing safe pharmaceuticals to treat multi-drug and extensively drug resistant
tuberculosis using an in silico cross-gene-family approach.
S. Kinnings, L. Xie and P.E. Bourne 2007 JACS, Submitted.
• Song Yang
• Beijing University, B.S. Chemistry
• Department of Chemistry and Biochemistry
• Graduated with PhD
Structural
assignments
My project identifies where and how proteins interact with each other
using protein sequences and structures. We focus on exploiting the
information extracted from 3D structures, which are expected to be very
useful with the growing number of structures determined by structural
genomics efforts.
• Jo-Lan Chung
• National Taiwan University, B.S. Chemistry 1999
• Department of Chemistry and
Biochemistry
• Graduated with PhD
Exploiting Sequence and
Structure Homologs to
Identify Protein-protein
Binding Sites
Structurally conserved residues, derived from multiple structure
alignments, are combined with sequence profile and accessible surface
area to predict protein-protein binding sites. The incorporation of
structure conservation significantly improves the prediction performance.
We are currently developing a prediction method to detect if two binding
sites are interacting with each other. The ultimate goal of this project is
to identify the binding sites of a protein and the corresponding binding
site on the interacting protein partner.
Exploiting Sequence and Structure Homologs to Identify Protein-Protein Binding Sites.
J.L. Chung, W. Wang, and P.E. Bourne 2006 Proteins: Structure, Function and Bioinformatics 62(3) 630-640.
We searched 238 complete bacterial genomes for
structures related to the proteasome, and found evidence
of two novel groups of bacterial proteasomes.
The first, which we name Anbu, is sparsely distributed
among cyanobacteria and proteobacteria. We hypothesize
that Anbu is an ancient proteasome. We also present
evidence for a fourth type of bacterial proteasome found in
a few β-proteobacteria, which we name β-proteobacteria
proteasome homolog (BPH).
Sequence and structural analysis show that Anbu and BPH
are both distinct from known bacterial proteasomes, but
have homologous structures. Anbu is encoded by one
gene, so we postulate a duplication of Anbu created the
20s proteasome. We have found different combinations of
Anbu, BPH, and HsIV within these bacterial genomes
which raises questions about specialized protein
degradation systems.
This work is supported by the NIH grant 1P01GM63208-01A1 and 2T32 GM08326.
Feb 1, 2008 Professional Development
Series
3
Rethinking proteasome evolution: Two novel bacterial proteasomes.
R. Valas and P.E. Bourne 2007 J. Mol. Evol., Submitted.
Feb 1, 2008 Professional Development
Series
4
Download