`AST 376/381 Planets and Life: Assignment 4: The Protein Fold Universe and Evolution of Early Proteins The goal of the new version of this assignment is to understand why so many researchers are studying the highlevel structure and structural classification of proteins, its implications for theories of evolutionary development of life on Earth, and finally as a way to concentrate your thought on the phenomenon of conformation (folding in this case) as a primary activity of life that may hold clues to its origin. A paper you will write at the end of this exercise will tie together your (future) understanding of weak noncovalent forces in QuickTime™ and a folding and other TIFF (Uncompressed) decompressor are needed to see this picture. conformations, the specific ways in which this generic process been used to develop complex organisms on Earth (which you will learn doing this work), and your speculations on what conditions would be required for its occurrence elsewhere. By the time you finish, you will understand why I used the figure to the right as a symbol for the current status of the field. Background The high-order structure of proteins, and classification of this protein structure, is of interest not only to theoreticians who want to solve the “protein folding problem,” but to evolutionary biologists, who can now use the full sequence data for the ~ 200 organisms whose complete genomes have been sequenced to trace the evolution of protein folding patterns. This is a subject that could only be guessed at previously. Did DNA damage repair proteins come before metabolic proteins? What subset of the complex metabolic system came first? Answers to these questions are not yet known, but there is some hope that eventually it can be done. That is the goal of the researchers you will encounter in this assignment. To begin you will need some background on the major biopolymers used in terrestrial life, so that you are comfortable with each of them. You should be able to explain their structure and function(s) to someone using informal language. I will give you some simple homework problems to motivate you along the way--some will be optional. There are several readings to help you along, including some already handed out in class: I will give you a separate list of the readings you should already have, and which will be arriving soon. The Cooper chapter that I keep mentioning is on our shelf in Peridier (along with Armitage and Papaloizou&Terquem on planet formation in case you have lost that or never printed it). To be sure, I am handing it out, along with other background reading that is needed, today and Friday. With that background, the goals of this assignment are: 1. Learn about the possibility of uncovering the evolutionary history of proteins through the use of “domain classification” beginning at the CATH domain classification web site. 2. Use the (I hope) intrinsically interesting questions about our conceptions of evolution raised by this research to motivate you to read the current materials on biopolymers in depth, and especially the soon-to-come readings on DNA/RNA and proteins, without which you may have little chance of understanding the material here. 3. Use this as a starting point for a crash course in current views of evolutionary dynamics, in this case the importance of gene duplication in driving evolutionary development, whether the current developments in genome dynamics (we’ll read and discuss others) could displace the conventional picture of mutation/selection as the central conception of evolution. Most importantly for this course, try always to pare down what you are told about, trying to imagine the minimal complexity that you think could still function. Concentrate on seeing folding as it might operate in a very primitive form. You should take notes on each part of this assignment, including the readings and any impressions you have, lists of molecules, in order to write a coherent, original report on this investigation when you are through. I’ll fill you in on how much more specific the report shoud be, except to tell you that I expect it to be a well-produced review that exhibits your familiarity with some part of the problems involved, and with plenty of instructive graphics that you will have culled from the 1000s of images at the sites you visit. The paper is due Wed. April 11 as a latest date to turn in. That gives you 12 days so don’t put off the readings. In the mean time there will be other readings on late early terrestrial planet evolution and the conditions for the beginnings of life, so plan accordingly. Finally, here are the steps in the assignment: Assignment (approached partly through hands on at CATH and then 3D viewer web sites). 1. Go to the CATH Protein Structure Classification site at http://www.cathdb.info/latest/index.html Type in “mainly alpha” in the search box. By the end of this you will know why such an unlikely phrase got a useful response, and even what CATH stands for. A new window will appear with a list of molecules that looks like: PDB code Header 1bag Alpha-Amylase 1bil Hydrolase (Alpha-Aminoacylpeptide) 1lcp Hydrolase (Alpha-Aminoacylpeptide) 1qah Alpha-Beta Structure 1col Alpha-Helical Bundle 1cos Alpha-Helical Bundle and thumbnails of the image of the tertiary structure. Scroll down and see a larger list of molecules of interest. You can see how the proteins are classified here, learn what it means with regards their structure, and then use the PDB code to view it in 3D at the Protein Data Bank. I will be asking you to include a few images of some proteins you think are key in this research, so you will need to still do some 3D viewing. Now, under “Navigation” on the left, hit “top of hierarchy” and see that the there are four main categories here. Explore them. Go to the explanation of the CATH structure classification procedure at; http://cathwww.biochem.ucl.ac.uk/cgibin/cath/GotoCath.pl?link=cath_info.html You will see many terms that are unfamiliar. What is a protein domain? Homologous superfamily? Fold group? Consider the types of structures in the illustration at the bottom of the page. This is the “Architecture” or “A” level (the A in CATH) and describes the overall shape of the domain structure, ignoring connectivity between the secondary structures. They have names like “barrel” or “3layer sandwich,” ‘beta-propellor,” A few of these are shown below. QuickTime™ and a TIFF (Uncomp resse d) de com press or are nee ded to s ee this picture. 2. In order to answer the question “What will you learn?” read the abstract to the primary background review paper (it is at a technical level): Protein families and their evolution - A structural perspective. Orengo CA, Thornton JM. (2005) Annual Review of Biochemistry. Vol 74. p. 867-900. on the motivation behind classification schemes like the CATH approach. (This paper can be downloaded from the course web site.) This review paper is not really an assigned reading, and probably could not be understood by any of us at the present time. I do recommend trying to read the introduction, looking at the section headings and illustrations, etc., just to see if you can get the general idea. It will be handy to know why you are doing what you are doing! Read the abstract carefully. You will see that this is really research in evolution, not more people trying to construct an energy function to plug in Schrodinger’s equation, an attempt to see how protein domain structures, that are found in common among organisms from bacteria to humans, can be used to understand how a process called gene duplication (and other processes) has been used at the genome level to advance the functionality and complexity of life. In addition, researchers are trying to trace back the evolution of protein domains to find, for example, if most proteins used in synthesis of nucleic acids (i.e. in replication) developed before or after those associated with some metabolic process (e.g. photosynthesis). It’s not as obvious as it sounds. For now, just try to learn about protein domains and domain families, however you can, in preparation for our look at gene duplication (and even whole genome duplication) as a primary process in evolution, perhaps even making mutations a second-order effect. ABSTRACT: We can now assign about two thirds of the sequences from completed genomes to as few as 1400 domain families for which structures are known and thus more ancient evolutionary relationships established. About 200 of these domain families are common to all kingdoms of life and account for nearly 50% of domain structure annotations in the genomes. Some of these domain families have been very extensively duplicated within a genome and combined with different domain partners giving rise to different multidomain proteins. The ways in which these domain combinations evolve tend to be specific to the organism so that less than 15% of the protein families found within a genome appear to be common to all kingdoms of life. Recent analyses of completed genomes, exploiting the structural data, have revealed the extent to which duplication of these domains and modifications of their functions can expand the functional repertoire of the organism, contributing to increasing complexity. 3. Search on “protein domain” or “domain structure” at Wikipedia, look over the material, visit some of the links (for terminology you may not be familiar with), and see the links to at least four more “fold libraries” (as they’re called in the field). Visit them to find if their orientation is different from CATH. Record your preliminary findings. Did you come across tutorial background material that is accessible to, say, upper division college students that is not made for either biochemists or grade-schoolers? Keep a list of any links to tutorials that seem helpful. Read completely the easy level Wikipedia presentation. 4. Obtain a broader perspective, or at least an opinion on developments since 1999, by taking a look at this 8-year old review paper, which you should try to read in detail and compare in as much detail. Protein folds, functions and evolution. Thornton JM, Orengo CA, Todd AE, Pearl FM. J Mol Biol. 1999 Oct 22;293(2):333-42. Summarize what you have learned, and particularly whether it appears that this field has fulfilled the promise that was made in this paper. Use the recent papers at the course web site for this (or tackle the 2005 review paper). Here are some later papers in journals that are not too technical, that you should try to look at while you are exploring the CATH and other sites. They should all be online at the web site for you to download. Exploiting protein structure data to explore the evolution of protein function and biological complexity. Marsden RL, Ranea JA, Sillero A, Redfern O, Yeats C, Maibaum M, Lee D, Addou S, Reeves GA, Dallman TJ, Orengo CA. Philos Trans R Soc Lond B Biol Sci. 2006 Mar 29;361(1467):425-40. Review. A more recent but difficult paper by this group is: Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. Marsden RL, Lewis TA, Orengo CA. BMC Bioinformatics. 2007 Mar 9;8:86. In case you want to learn more about what you can do at the CATH site, here is a recent reference, but it is likely to be very technical: The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res. 2007 Jan;35: D291-7. Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA. Here are a couple of articles that sounded interesting: Protein superfamily evolution and the last universal common ancestor (LUCA). Ranea JA, Sillero A, Thornton JM, Orengo CA. J Mol Evol. 2006 Oct;63(4):513-25. Supra-domains: evolutionary units larger than single protein domains. Vogel C, Berzuini C, Bashton M, Gough J, Teichmann SA. J Mol Biol. 2004 Feb 20;336(3):809-23. Convergent evolution of domain architectures (is rare). Gough J. Bioinformatics. 2005 Apr 15;21(8):1464-71. Epub 2004 Dec 7. 5. Go to the 3D viewing sites that were the original assignment vehicle, and find whether the data on families, domains, fold classes, etc. has been studied there, and take the opportunity to view the 3D structure of a few of the more common motifs (like the “beta-propellor”). After some exploration, write a paragraph (or more) explaining whether you have found any way to tie together the two kinds of sites: The structural categorization places like CATH, and the 3D viewing like Protein Data Bank. For example, is the “sequence information” that CATH used to make their classifications available at PDB? 6. Finally, integrate your notes and ideas into a written short paper that discusses the nature of folding and its role in evolution, with a focus on the possibility that it played a crucial role in the development of the earliest life. By the time you write this we will have covered some of the requisite topics. Here are some links to 3D viewing sites: Protein Data Bank: http://www.rcsb.org/pdb/home/home.do While there, take advantage of the past “molecules of the month” list and read about some unusual biopolymers. Molecules to Go: http://molbio.info.nih.gov/cgi-bin/pdb lesSwiss-Prot: http://www.expasy.ch/ (This is actually the over-site, for the ExPASy Proteomics Server . It is the proteomics seerver of the Swiss Institute of Bioinformatics. A course? http://swissmodel.expasy.org//course/text/chapter4b.htm If you think you’ve seen a collection of links before, take a look at this one: http://www.expasy.ch/links.html#Proteins You should be able to find every protein structure group in the world from here. http://people.ouc.bc.ca/woodcock/molecule/molecule.html U. So. Maine tutorial for Deep View-Swiss-PdbViewer, for the beginning molecular modeler or viewer [Note: this might only run on OS9] http://www.usm.maine.edu/~rhodes/SPVTut/index.html World Index of Molecular Visualization Resources: http://molvis.sdsc.edu/visres/index.html related pages: http://www.molvisions.com/ http://molvis.sdsc.edu/visres/index.html#c-rtu http://molvis.sdsc.edu/visres/deepview/titles.jsp While I’m at it, a good list of DNA sites is: http://molvis.sdsc.edu/dna/moredna.htm Lehninger 3D Structure Tutorials http://www.worthpublishers.com/lehninger3d/lold/index.html Lehninger is the world’s most admired biochemistry textbooks, which most biochemists and medical students must get through in their first few semesters. It has a great online viewing/tutorial site (I hear) but it requires that you download chime software. If you are computer-savy and want to do this, give it a try.