The Evolution of Protein Structure and Function as Studied through Structural Bioinformatics Philip E. Bourne Skaggs School of Pharmacy and Pharmaceutical Sciences University of California San Diego pbourne@ucsd.edu 1 Agenda • What is structural bioinformatics and how do YOU drive it? • Prerequisites: the sequence-structure-function relationship • Some exciting developments – Using protein structure to study evolution – Functional prediction, pathway mapping and the RCSB PDB response • Unsolved problems – Structure comparison – Domain definition • What more could be done to drive the field forward? 2 3 Personal Definition • Improving our understanding of living systems through the study of macromolecular structure en masse 2nd Edition J. Gu and P.E. Bourne (Eds.) John Wiley and Sons NJ What is Structural Bioinformatics? • Each structure is a data point is an effort to gain broader understanding 4 A Field Driven by Your Activity Number of released entries Depositions to the PDB by decade Year: What is Structural Bioinformatics? 5 Lysozyme Blake, Koenig, Mair, North, Phillips, Sarma (1965) Nature 206 757 Proportion of enzyme classes relative to total enzyme structures Ribonuclease Kartha, Bello, Harker (1967) Nature 213, 862-865; Wyckoff, Hardman, Allewell, Inagami, Johnson, Richards (1967) J. Biol. Chem. 242, 3753-3757. Ligases Isomerases Lyases Hydrolases Transferases Oxidoreductases Percent Enzymes A Field Subject to Some Bias Decade: RNA-containing structures tRNA J.L. Sussman, S.-H. Kim (1976) Biochem Biophys Res Commun. 68:89-96; J.D. Robertus, J.E. Ladner, J.T. Finch, D. Rhodes, R.S. Brown, B.F.C. Clark, & A. Klug (1974) Nature 250: 546-551. Protein/RNA complexes RNA only DNA/RNA hybrid Protein/DNA/RNA complexes What is Structural Bioinformatics? 6 Decade: A Field Subject to Some Bias PDB vs Human Genome EC – Hydrolases – Begins to Illustrate the Bias in the PDB PDB 2.5 Transferring alkyl or aryl groups over represented in PDB 2.4 Glycosyltransferases under represented in PDB Ensembl Human Genome Annotation What is Structural Bioinformatics? 7 Xie and Bourne 2005 PLoS Comp. Biol. 1(3) e31 http://sg.rcsb.org Agenda • What is structural bioinformatics and how do YOU drive it? • Prerequisites: the sequence-structure-function relationship • Some exciting developments – Using protein structure to study evolution – Functional prediction, pathway mapping and the RCSB PDB response • Unsolved problems – Structure comparison – Domain definition • What more could be done to drive the field forward? 8 Sequence vs Structure Twilight Zone Midnight Zone The classic hssp curve from Sander and Schneider (1991) Proteins 9:56-68 The Sequence Structure Function Relationship 9 There Are No Absolute Rules - Similar Sequences – Different Structures 1PIV:1 Viral Capsid Protein 1HMP:A Glycosyltransferase 10 80 Residue Stretch (Yellow) with Over 40% Sequence Identity The Sequence Structure Function Relationship Structure vs Function Follows a Power Law Distribution • Some folds are promiscuous and adopt many different functions - superfolds Qian J, Luscombe NM, Gerstein M. JMB 2001313(4):673-81 11 The Sequence Structure Function Relationship Examples of Superfolds.. 12 The Sequence Structure Function Relationship Structure Is Highly Redundant Structure Alignments using CE with z>4.0 The Russian Doll Effect Homology modeling is used here Pharm 201 Lecture 09, 2009 The Sequence Structure Function Relationship 13 I.N. Shindyalov and P.E. Bourne 2000 Proteins 38(3), 247-260 How Can we Utilize these Seemingly Complex Relationships? 14 Agenda • What is structural bioinformatics and how do YOU drive it? • Prerequisites: the sequence-structure-function relationship • Some exciting developments – Using protein structure to study evolution – Functional prediction, pathway mapping and the RCSB PDB response • Unsolved problems – Structure comparison – Domain definition • What more could be done to drive the field forward? 15 Nature’s Reductionism There are ~ 20300 possible proteins >>>> all the atoms in the Universe 9.5M protein sequences from UniProt/TrEMBL (10/09) 38,221 protein structures Yield 1195 folds, 1962 superfamilies, 3902 families (SCOP 1.75) Using Protein Structure to Study Evolution Consider First the Evolutionary History of One Superfamily – the Protein Kinase-like Superfamily E. Scheeff and P.E. Bourne 2005 PLoS Comp. Biol. 1(5): e49. 17 Using Protein Structure to Study Evolution The Protein Kinase-like Superfamily • A large family important to signal transduction in eukaryotes and many bacteria. • Phosphotransferases: transfer phosphate group from ATP to Ser/Thr or Tyr residue on target protein, producing a range of downstream signaling effects. • PKA: an example of a typical protein kinase (TPK) fold, shown in “open book” format PSB 2007 Using Protein Structure to Study Evolution 18 The Protein Kinase-Like Superfamily • A range of different families, all phosphotransferases • A variety of different targets • All possess a core cassette of elements shared with the TPKs: • ATP binding • Catalysis • Structures can be highly variable, particularly in the substrate binding regions Family Structural Representative Phosphorylates Biological result Typical Protein Kinases (TPKs) Protein Kinase A (PKA) Ser/Thr or Tyr residues of proteins Range of signaling effects Alpha kinases Channel Kinase (ChaK) Ser/Thr residues in alpha-helices Range of signaling effects Actin-Fragmin Kinase (AFK) Actin-Fragmin Kinase (AFK) Thr residue of actin Control of actin polymerization Phosphatidyl -inositol 3- and 4kinases Phosphatidylinositol 3-kinase (PI3K) Phosphatidylinositol (PI), PIphosphates, PIbisphosphates Range of secondmessenger signaling effects Phosphatidylinositol phosphate kinases Phosphatidylinositol phosphate kinase (PIPK) PI-phosphates Range of secondmessenger signaling effects Choline/ ethanolamine kinases Choline Kinase (CK) Choline Part of pathway that eventually produces phoshpatidylcholine, important constituent of membranes Aminoglycoside Kinases Aminoglycoside Kinases (AK) Aminoglycoside antibiotics Antibiotic resistance 19 Using Protein Structure to Study Evolution Method • Begin with a multiple structure alignment using CEMC (NAR 2004) of 30 “comparable” TPKs and APKs and manually correct in a pair-wise manner over a period of 1-2 person years • Review the literature on each structure • Review the associated sequence alignments derived from structure E. Scheeff and P.E. Bourne 2005 PLoS Comp. Biol. 1(5): e49. 20 Using Protein Structure to Study Evolution Let Us Side Track for One Minute on Structural Bioinformatics Methodology Biological vs Geometric Alignments Plastocyanin versus Azurin (from Godzik 1996) Maintain 9 of 10 interactions RMSD 1.5 Å Maintain 5 of 10 interactions RMSD 0.5 Å Pharm 201 Lecture 10, 2009 Structural Bioinformatics Unsolved Problems 21 Phosphoinositide-3 Kinase (D) and Actin-Fragmin Kinase (E) PKA ChaK (“Channel Kinase”) 22 Using Protein Structure to Study Evolution Can We Propose an Evolutionary History for the Protein Kinase-Like Superfamily? •Bayesian inference of phylogeny (MrBayes) •Manual structure alignment produces very high-quality sequence alignment of diverse homologues Example columns: 1BO1 Atypical 0 0 0 0 1 1IA9 Atypical 1 1 1 1 0 1) Ion pair analogous to K72-E91 in PKA 1E8X Atypical 1 0 1 1 1 2) α-Helix B present 3) State of α-Helix C (0: kinked, 1: straight) •But, sequence information too degraded to produce branching with sufficient support (i.e. a high posterior probability) 4) State of Strand 4 (0: kinked, 1: straight) 5) α-Helix D present •Addition of a matrix of structural characteristics (similar to morphological characteristics) produces a well supported combined model •Neither sequence structural characteristics sufficient to alone produce resolved tree, must be used in combination. PSB 2007 Using Protein Structure to Study Evolution 1 2 3 4 5 1CJA Atypical 1 0 1 1 1 1NW1 Atypical 1 0 1 0 0 1J7U Atypical 1 0 1 0 1 1CDK AGC 1 1 1 0 1 1O6L AGC 1 1 1 0 1 1OMW AGC 1 1 1 0 1 1H1W AGC 1 1 1 0 1 1MUO Other 1 1 1 0 1 1TKI CAMK 1 0 1 0 1 1JKL CAMK 1 0 1 0 1 1A06 CAMK 1 0 1 0 1 1PHK CAMK 1 0 1 0 1 1KWP CAMK 1 0 1 0 1 1IA8 CAMK 1 0 1 0 0 1GNG CMGC 1 0 1 0 1 1HCK CMGC 1 0 1 0 1 1JNK CMGC 1 0 1 0 1 1HOW CMGC 1 0 1 0 1 1LP4 Other 1 0 1 0 1 1F3M STE 1 0 1 0 1 1O6Y Other 1 0 1 0 1 1CSN CK1 1 0 1 0 1 1B6C TKL 1 0 1 0 1 2SRC TK 1 0 1 0 1 1LUF TK 1 0 1 0 1 1IR3 TK 1 0 1 0 1 1M14 TK 1 0 1 0 1 1GJO TK 1 0 1 0 1 23 Proposed Evolutionary History for the Protein Kinase-Like Superfamily APH • Suggests distinctive history for atypical kinases, as opposed to intermittent divergence from the typical protein kinases (TPKs) AGC CK 0.64 AFK • TPK portion of tree shows high degree of agreement with Manning tree • Branching is supported by species representation of kinase families CAMK 0.97 CMGC 1.0 0.85 0.78 TKL PI3K CK1 TK •Atypical kinase families: Blue PIPKIIβ ChaK PSB 2007 Using Protein Structure to Study Evolution •Typical protein kinase groups (subfamilies): Red •Branch labels: posterior 24 probability of branch What Happens if We Use Structure to Look Across Superfamilies? Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8 25 Using Protein Structure to Study Evolution To Answer this Question We Only Need to Make Use of Existing Resources! • SCOP – Further catalogs Nature’s reductionism into structural domains, folds, families and superfamilies • SUPERFAMILY assigns the above to fully sequenced proteomes 26 Using Protein Structure to Study Evolution Use of SCOP Superfamilies • How do you distinguish convergent versus divergent evolution? • The SCOP notion of SUPERFAMILY with evidence of weak sequence relationships can be used to discount convergence. 27 Using Protein Structure to Study Evolution Structure Provides an Evolutionary Fingerprint Distribution among the three kingdomsas taken from SUPERFAMILY Eukaryota (650) 135 153/14 • Superfamily distributions would seem to be related to the complexity of life • Update of the work of Caetano-Anolles2 (2003) Genome Biology 13:1563 10 118 21/2 310/0 387 645/49 9/1 12 17 29/0 Archaea (416) 42 68/0 Bacteria (564) SCOP fold (765 total) Any genome / All genomes 28 Using Protein Structure to Study Evolution The Unique Superfamily in Archaea – d.17.6 • Archaeosine tRNAguanine transglycosylase (tgt), C2 domain • First step in the biosynthesis of an archaea-specific modified base, archaeosine (7formamidino-7deazaguanosine) • Found in tRNAs • Was found exclusively in Archaea. Reference: Interpro IPR004804 29 Using Protein Structure to Study Evolution Method – Distance Determination Presence/Absence Data Matrix (FSF) SCOP organisms SUPERFAMILY C. intestinalis C. briggsae F. rubripes a.1.1 1 1 1 a.1.2 1 1 1 a.10.1 0 0 1 a.100.1 1 1 1 a.101.1 0 0 0 a.102.1 0 1 1 a.102.2 1 1 1 Distance Matrix C. intestinalis C. intestinalis C. briggsae F. rubripes 0 101 109 0 144 C. briggsae F. rubripes 0 30 Using Protein Structure to Study Evolution Is Structure a Useful Discriminator of Species? - Yes Archaea Bacteria Eukaryota The method cleanly placed all species in their correct superkingdoms Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8 31 Using Protein Structure to Study Evolution If Structure is so Conserved is it a Useful Tool in the Study of Evolution? The Answer Would Appear to be Yes • It is possible to generate a reasonable tree of life from merely the presence or absence of superfamilies (FSFs) within a given proteome Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8 32 Using Protein Structure to Study Evolution The Influence of Environment on Life Chris Dupont Scripps Institute of Oceanography UCSD DuPont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827 33 Using Protein Structure to Study Evolution Consider the Distribution of Disulfide Bonds among Folds • Disulphides are only stable under oxidizing conditions • Oxygen content gradually accumulated during the earth’s evolution • The divergence of the three kingdoms occurred 1.8-2.2 billion years ago • Oxygen began to accumulate ~ 2.0 billion years ago • Logical deduction – disulfides more prevalent in folds (organisms) that evolved later • This would seem to hold true Eukaryota 31.9% (43/135) 0% (0/10) 0% (0/2) 1 4.7% (18/387) 14.4% (17/118) 5.9% (1/17) 16.7% (7/42) Archaea Bacteria SCOP fold (708 total) • Can we take this further? 34 Using Protein Structure to Study Evolution Evolution of the Earth • • • • • 4.5 billion years of change 300+50K 1-5 atmospheres Constant photoenergy Chemical and geological changes • Life has evolved in this time • The ocean was the “cradle” for 90% of evolution 35 Using Protein Structure to Study Evolution Theoretical Levels of Trace Metals and Oxygen in the Deep Ocean Through Earth’s History Bacteria Archaea Eukarya 1 Oxygen 0 1.00E-08 Zinc 1.00E-12 1.00E-16 1.00E-20 1.00E-06 Iron 1.00E-09 1.00E-12 1.00E-15 1.00E-07 Cobalt Manganese 1.00E-09 Concentration (O2 in arbitrary units, Zn and Fe in moles L-1 0.5 1.00E-11 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 Billions of years before present • Whether the deep ocean became oxic or euxinic following the rise in atmospheric oxygen (~2.3 Gya) is debated, therefore both are shown (oxic ocean-solid lines, euxinic ocean-dashed lines). • The phylogenetic tree symbols at the top of the figure show one idea as to the theoretical periods of diversification for each Superkingdom. Replotted from Saito et al, 2003 Inorganica Chimica Acta 356: 308-318 36 Using Protein Structure to Study Evolution The Gaia Hypothesis Gaia (pronounced /'geɪ.ə/ or /'gaɪ.ə/) "land" or "earth", from the Greek Γαῖα; is a Greek goddess personifying the Earth Gaia - a complex entity involving the Earth's biosphere, atmosphere, oceans, and soil; the totality constituting a feedback system which seeks an optimal physical and chemical environment for life on this planet. James Lovelock 37 Using Protein Structure to Study Evolution The Question • Have the emergent properties of an organism as judged by its protein content been influenced by the environment? • Will do this by consideration of the metallomes of a broad range of species • The metallomes can only be deduced by consideration of the protein structures to which the metal is covalently bound • Will hypothesize that these emergent properties in turn influenced the environment 38 Using Protein Structure to Study Evolution Making the Metallome of Each Species – Can Only be Done from Structure and Requires Human Effort 1. 2. 3. 4. 5. 6. 7. Start with SCOP Each {super}family level assignment was checked manually for metal binding All the structures representing the family had to bind the metal for it to be considered unambiguous The literature was consulted to resolve ambiguities Superfamily database used to map to proteomes 23 Archaea, 233 Bacteria, 57 Eukaryota Cu, Ni, Mo ignored (<0.3%) of proteome 39 Using Protein Structure to Study Evolution Levels of Ambiguity • Ambiguous superfamily binds different metals or have members that are not known to bind metals • Ditto families • Approx 50% of superfamilies and 10% of families are ambiguous • Only unambiguous families used in this study 40 Using Protein Structure to Study Evolution Superfamily Distribution As Well As Overall Content Has Changed Bacteria Fe superfamilies a.1.1 a.1.2 a.1.1 a.1.2 a.104.1 a.110.1 a.104.1 a.110.1 a.119.1 a.138.1 a.119.1 a.138.1 a.2.11 a.24.3 a.2.11 a.24.3 a.24.4 a.25.1 a.24.4 a.25.1 a.3.1 a.39.3 a.3.1 a.39.3 a.56.1 a.93.1 a.56.1 a.93.1 b.1.13 b.2.6 b.1.13 b.2.6 b.3.6 b.33.1 b.3.6 b.33.1 b.70.2 b.82.2 b.70.2 b.82.2 c.56.6 c.83.1 c.56.6 c.83.1 c.96.1 d.134.1 c.96.1 d.134.1 d.15.4 d.174.1 d.15.4 d.174.1 d.178.1 d.35.1 d.178.1 d.35.1 d.44.1 d.58.1 d.44.1 d.58.1 e.18.1 e.19.1 e.18.1 e.19.1 e.26.1 e.5.1 e.26.1 e.5.1 f.21.1 f.21.2 f.21.1 f.21.2 f.24.1 f.26.1 f.24.1 f.26.1 g.35.1 g.36.1 g.35.1 g.36.1 g.41.5 Eukaryotic Fe superfamilies g.41.5 41 Using Protein Structure to Study Evolution 14 100 90 80 70 60 50 40 30 20 10 0 12 10 8 6 4 2 (♦)Average copy number (x) Percent of Bacterial proteomes which a fold family occurs in Fe Containing Proteins in Bacteria 0 Unique Fe-binding fold families (108 total) • A quantile plot showing the percent of Bacterial proteomes each Fe-binding fold family occurs in (x). • This plot also shows the average copy number of that fold family in the proteomes where it occurs (♦). • Few Fe-binding folds are in most proteomes. • Widespread Fe-binding folds are not necessarily abundant. • Similar trends are observed for Zn, Mn, and Co in all three Superkingdoms. 42 Using Protein Structure to Study Evolution 2 A Slope of fitted power law Total Zn-binding domains in a proteome 10 10 4 Metal Binding Proteins are Not Consistent Across Superkingdoms 102.5 Total domains in a proteome 105 B Archaea Bacteria Eukarya Zn Fe Mn 1 0 Co Since these data are derived from current species they are independent of evolutionary events such as duplication, gene loss, horizontal transfer and endosymbiosis 43 Using Protein Structure to Study Evolution Power Laws: Fundamental Constants in the Evolution of Proteomes A slope of 1 indicates that a group of structural domains is in equilibrium with genome growth, while a slope > 1 indicates that the group of domains is being preferentially duplicated (or retained in the case of genome reductions). van Nimwegen E (2006) in: Koonin EV, Wolf YI, Karev GP, (Ed.). Power laws, scale-free networks, and genome biology Using Protein Structure to Study Evolution 2 A 102.5 Slope of fitted power law Total Zn-binding domains in a proteome 10 10 4 Metal Binding Proteins are Not Consistent Across Superkingdoms Total domains in a proteome 105 B Archaea Bacteria Eukarya Zn Fe Mn 1 0 Co 45 Using Protein Structure to Study Evolution Why are the Power Laws Different for Each Superkingdom? • Power laws are likely influenced by selective pressure. Qualitatively, the differences in the power law slopes describing Eukarya and Prokarya are correlated to the shifts in trace metal geochemistry that occur with the rise in oceanic oxygen • We hypothesize that proteomes contain an imprint of the environment at the time of the last common ancestor in each Superkingdom • This suggests that Eukarya evolved in an oxic environment, whereas the Prokarya evolved in anoxic environments 46 Using Protein Structure to Study Evolution Do the Metallomes Contain Further Support for this Hypothesis? Superkingdom Eukarya Archaea Bacteria Fold Family Cytochrome P450 Cytochrome c3-like Cytochrome b5 Purple acid phosphatase Penicillin synthase-like Hypoxia-inducible factor Di-heme elbow motif 4Fe-4S ferredoxins MoCo biosynthesis proteins Heme-binding PAS domain HemN a helical ferrodoxin biotin synthase ROO N-terminal domain-like High potential iron protein Heme-binding PAS domain MoCo biosynthesis proteins HemN 4Fe-4S ferredoxins cytochrome c a helical ferrodoxin % 0.44 + 0.48 0.13 + 0.3 0.12 + 0.09 0.11 + 0.08 0.07 + 0.1 0.07 + 0.04 0.06 + 0.01 1.80 + 0.7 1.60 + 0.3 1.10 + 1.0 0.80 + 0.20 0.60 + 0.16 0.55 + 0.1 0.5 + 0.1 0.38 + 0.25 0.3 + 0.4 0.21 + 0.15 0.2 + 0.15 0.2 + 0.2 0.14 + 0.2 0.12 + 0.09 Fe-binding heme heme heme amino amino amino heme Fe-S Fe-S heme Fe-S Fe-S Fe-S amino Fe-S heme Fe-S Fe-S Fe-S heme Fe-S O2 yes no no no yes yes no no no no 1 no no 2 no 1 no no no no no Overall percent of Fe bound by Fe-S heme amino 21 + 9 47 + 19 32 + 12 68 + 12 13 + 14 19 + 6 47 + 11 22 + 12 31 + 16 1. Some, but not all, PAS domains actually sense oxygen 2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway 47 Using Protein Structure to Study Evolution e- Transfer Proteins Same Broad Function, Same Metal, Different Chemistry Induced by the Environment? Fe-S clusters Cytochromes Fe bound by S Fe bound by heme (and amino-acids) Cluster held in place by Cys Generally negative reduction potentials Generally positive reduction potentials Less susceptible to oxidation Very susceptible to oxidation 48 Using Protein Structure to Study Evolution Hypothesis • Emergence of cyanobacteria changed oxygen concentrations • Impacted relative metal ion concentrations in the ocean • Organisms evolved to use these metals in new ways to evolve new biological processes eg complex signaling\ • This in turn further impacted the environment • Only protein structures could reveal such dependencies 49 Using Protein Structure to Study Evolution Agenda • What is structural bioinformatics and how do YOU drive it? • Prerequisites: the sequence-structure-function relationship • Some exciting developments – Using protein structure to study evolution – Functional prediction, pathway mapping and the RCSB PDB response • Unsolved problems – Structure comparison – Domain definition • What more could be done to drive the field forward? 50 Our Methods are Still Not Good Enough The 3D Domain Assignment Problem A domain is a fundamental structural, functional and evolutionary unit of a protein: Compact Stable Have hydrophobic core Fold independently Perform specific function Can be re-shuffled and put together in different combinations Evolution works on the level of domain Unsolved Problems – 3D Domain Definition Evaluation of automatic domain assignment methods Structures with issues (all/most methods) Large structures, complex architectures 1dcea Very small simple domains: difficult to separate. Issues: minimum domain size, low contact density 1ubdc Experts: 3 NCBI method, PDP, DomainParser : 5 PUU: 6 1bxrc Experts: 4 NCBI method: 4 DomainParser: 2 PDP, PUU: 2 1e88a Experts: 3 PUU: 1 PDP: 2 Experts: 6 DomainParser: 5 PUU: 2 PDP: 2 NCBI: 2 Unsolved Problems – 3D Domain Definition NCBI methods: 8 Manual vs. Automatic Consensus Chains with manual consensus: 375 (80% of entire dataset) Chains with automatic consensus: 374 (80% of entire dataset) Chains with consensus (automatic or manual) : 424 (90.6% of entire dataset) Automatic consensus only 46 chains (10.9% of chains with consensus) Manual consensus only 47 chains (11.1% of chains with consensus) Manual and automatic consensus agree 328 chains (77.3% of chains with consensus) Automatic consensus and manual consensus disagree 3 chains (0.7% of chains with consensus) Unsolved Problems – 3D Domain Definition JMB 2004 339(3), 647-678 Natalie Dawson Unpublished http://itol.embl.de/ Natalie 55 Agenda • What is structural bioinformatics and how do YOU drive it? • Prerequisites: the sequence-structure-function relationship • Some exciting developments – Using protein structure to study evolution – Functional prediction, pathway mapping and the RCSB PDB response • Unsolved problems – Structure comparison – Domain definition • What more could be done to drive the field forward? 56 Structure determination or modeling of whole metabolic network What are the implications of this? • Biochemical reactions, pathways, and networks can now be described in the context of entire cells • Enables more realistic simulations of the behavior of metabolic networks • Better understanding of evolution - compare pathways between organisms • Predict effects of mutations and drugs • Synthetic Biology Pathway Agenda • What is structural bioinformatics and how do YOU drive it? • Prerequisites: the sequence-structure-function relationship • Some exciting developments – Using protein structure to study evolution – Functional prediction, pathway mapping and the RCSB PDB response • Unsolved problems – Structure comparison – Domain definition • What more could be done to drive the field forward? 62 Better Interoperability Between the Data and the Literature Upon Which it is Based 63 What More Could be Done to Drive the Field Forward? Data Database Knowledge Knowledgebase Data Only Wikis Datapacks Journals Annotation Data + Annotation Data + Some Annotation Data + Some Annotation + Some Integration PLoS iStructure The Database View www.rcsb.org/pdb/explore/literature.do?structureId=1TIM Context What More Could be Done to Drive the Field Forward? The Literature View – Web 3.0? http://betastaging.rcsb.org What More Could be Done to Drive the Field Forward? Acknowledgements • Protein-protein Interactions – JoLan Chung & Wei Wang • Functional Flexibility – Jenny Gu & Michael Gribskov • Multipolar Representation – Apostol Gramada • Funding, NSF, NIH 67