376Assign4ProteinUniv

advertisement
`AST
376/381 Planets and Life: Assignment 4:
The Protein Fold Universe and Evolution of Early
Proteins
The goal of the new version of this assignment is to
understand why so many researchers are studying the highlevel structure and structural classification of proteins,
its implications for theories of evolutionary development
of life on Earth, and
finally as a way to
concentrate your thought on
the phenomenon of
conformation (folding in
this case) as a primary
activity of life that may
hold clues to its origin. A
paper you will write at the
end of this exercise will
tie together your (future)
understanding of weak
noncovalent forces in
QuickTime™ and a
folding and other
TIFF (Uncompressed) decompressor
are needed to see this picture.
conformations, the specific
ways in which this generic
process been used to develop
complex organisms on Earth
(which you will learn doing
this work), and your
speculations on what
conditions would be required
for its occurrence
elsewhere. By the time you
finish, you will understand
why I used the figure to the
right as a symbol for the
current status of the field.
Background
The high-order structure of proteins, and
classification of this protein structure,
is of interest
not only to theoreticians who want to solve the “protein
folding problem,” but to evolutionary biologists, who can
now use the full sequence data for the ~ 200 organisms
whose complete genomes have been sequenced to trace the
evolution of protein folding patterns. This is a subject
that could only be guessed at previously. Did DNA damage
repair proteins come before metabolic proteins? What
subset of the complex metabolic system came first? Answers
to these questions are not yet known, but there is some
hope that eventually it can be done. That is the goal of
the researchers you will encounter in this assignment.
To begin you will need some background on the major
biopolymers used in terrestrial life, so that you are
comfortable with each of them. You should be able to
explain their structure and function(s) to someone using
informal language. I will give you some simple homework
problems to motivate you along the way--some will be
optional. There are several readings to help you along,
including some already handed out in class: I will give you
a separate list of the readings you should already have,
and which will be arriving soon. The Cooper chapter that I
keep mentioning is on our shelf in Peridier (along with
Armitage and Papaloizou&Terquem on planet formation in case
you have lost that or never printed it). To be sure, I am
handing it out, along with other background reading that is
needed, today and Friday.
With that background, the goals of this assignment
are:
1. Learn about the possibility of uncovering the
evolutionary history of proteins through the use of “domain
classification” beginning at the CATH domain classification
web site.
2. Use the (I hope) intrinsically interesting
questions about our conceptions of evolution raised by this
research to motivate you to read the current materials on
biopolymers in depth, and especially the soon-to-come
readings on DNA/RNA and proteins, without which you may
have little chance of understanding the material here.
3. Use this as a starting point for a crash course in
current views of evolutionary dynamics, in this case the
importance of gene duplication in driving evolutionary
development, whether the current developments in genome
dynamics (we’ll read and discuss others) could displace the
conventional picture of mutation/selection as the central
conception of evolution. Most importantly for this course,
try always to pare down what you are told about, trying to
imagine the minimal complexity that you think could still
function. Concentrate on seeing folding as it might
operate in a very primitive form.
You should take notes on each part of this assignment,
including the readings and any impressions you have, lists
of molecules, in order to write a coherent, original report
on this investigation when you are through. I’ll fill you
in on how much more specific the report shoud be, except to
tell you that I expect it to be a well-produced review that
exhibits your familiarity with some part of the problems
involved, and with plenty of instructive graphics that you
will have culled from the 1000s of images at the sites you
visit. The paper is due Wed. April 11 as a latest date to
turn in. That gives you 12 days so don’t put off the
readings. In the mean time there will be other readings on
late early terrestrial planet evolution and the conditions
for the beginnings of life, so plan accordingly.
Finally, here are the steps in the assignment:
Assignment (approached partly through hands on at CATH and
then 3D viewer web sites).
1. Go to the CATH Protein Structure Classification site at
http://www.cathdb.info/latest/index.html
Type in “mainly alpha” in the search box. By the end of
this you will know why such an unlikely phrase got a useful
response, and even what CATH stands for. A new window will
appear with a list of molecules that looks like:
PDB code Header
1bag
Alpha-Amylase
1bil
Hydrolase (Alpha-Aminoacylpeptide)
1lcp
Hydrolase (Alpha-Aminoacylpeptide)
1qah
Alpha-Beta Structure
1col
Alpha-Helical Bundle
1cos
Alpha-Helical Bundle
and thumbnails of the image of the tertiary structure.
Scroll down and see a larger list of molecules of interest.
You can see how the proteins are classified here, learn
what it means with regards their structure, and then use
the PDB code to view it in 3D at the Protein Data Bank. I
will be asking you to include a few images of some proteins
you think are key in this research, so you will need to
still do some 3D viewing.
Now, under “Navigation” on the left, hit “top of
hierarchy” and see that the there are four main categories
here. Explore them. Go to the explanation of the CATH
structure classification procedure at;
http://cathwww.biochem.ucl.ac.uk/cgibin/cath/GotoCath.pl?link=cath_info.html
You will see many terms that are unfamiliar. What is
a protein domain? Homologous superfamily? Fold group?
Consider the types of structures in the illustration
at the bottom of the page. This is the “Architecture” or
“A” level (the A in CATH) and describes the overall shape
of the domain structure, ignoring connectivity between the
secondary structures. They have names like “barrel” or “3layer sandwich,” ‘beta-propellor,” A few of these are
shown below.
QuickTime™ and a
TIFF (Uncomp resse d) de com press or
are nee ded to s ee this picture.
2. In order to answer the question “What will you learn?”
read the abstract to the primary background review paper
(it is at a technical level):
Protein families and their evolution - A structural
perspective. Orengo CA, Thornton JM. (2005)
Annual Review of Biochemistry. Vol 74. p. 867-900.
on the motivation behind classification schemes like the
CATH approach. (This paper can be downloaded from the
course web site.) This review paper is not really an
assigned reading, and probably could not be understood by
any of us at the present time. I do recommend trying to
read the introduction, looking at the section headings and
illustrations, etc., just to see if you can get the general
idea. It will be handy to know why you are doing what you
are doing!
Read the abstract carefully. You will see that this
is really research in evolution, not more people trying to
construct an energy function to plug in Schrodinger’s
equation, an attempt to see how protein domain structures,
that are found in common among organisms from bacteria to
humans, can be used to understand how a process called gene
duplication (and other processes) has been used at the
genome level to advance the functionality and complexity of
life. In addition, researchers are trying to trace back
the evolution of protein domains to find, for example, if
most proteins used in synthesis of nucleic acids (i.e. in
replication) developed before or after those associated
with some metabolic process (e.g. photosynthesis). It’s
not as obvious as it sounds.
For now, just try to learn about protein domains and
domain families, however you can, in preparation for our
look at gene duplication (and even whole genome
duplication) as a primary process in evolution, perhaps
even making mutations a second-order effect.
ABSTRACT: We can now assign about two thirds of the
sequences from completed genomes to as few as 1400 domain
families for which structures are known and thus more
ancient evolutionary relationships established. About 200
of these domain families are common to all kingdoms of life
and account for nearly 50% of domain structure annotations
in the genomes. Some of these domain families have been
very extensively duplicated within a genome and combined
with different domain partners giving rise to different
multidomain proteins. The ways in which these domain
combinations evolve tend to be specific to the organism so
that less than 15% of the protein families found within a
genome appear to be common to all kingdoms of life. Recent
analyses of completed genomes, exploiting the structural
data, have revealed the extent to which duplication of
these domains and modifications of their functions can
expand the functional repertoire of the organism,
contributing to increasing complexity.
3. Search on “protein domain” or “domain structure” at
Wikipedia, look over the material, visit some of the links
(for terminology you may not be familiar with), and see the
links to at least four more “fold libraries” (as they’re
called in the field). Visit them to find if their
orientation is different from CATH. Record your
preliminary findings. Did you come across tutorial
background material that is accessible to, say, upper
division college students that is not made for either
biochemists or grade-schoolers? Keep a list of any links to
tutorials that seem helpful. Read completely the easy
level Wikipedia presentation.
4. Obtain a broader perspective, or at least an opinion on
developments since 1999, by taking a look at this 8-year
old review paper, which you should try to read in detail
and compare in as much detail.
Protein folds, functions and evolution.
Thornton JM, Orengo CA, Todd AE, Pearl FM.
J Mol Biol. 1999 Oct 22;293(2):333-42.
Summarize what you have learned, and particularly whether it
appears that this field has fulfilled the promise that was made in this
paper. Use the recent papers at the course web site for this (or tackle
the 2005 review paper).
Here are some later papers in journals that are not too technical, that you should
try to look at while you are exploring the CATH and other sites. They should all be
online at the web site for you to download.
Exploiting protein structure data to explore the evolution of protein
function and biological complexity.
Marsden RL, Ranea JA, Sillero A, Redfern O, Yeats C, Maibaum M, Lee D, Addou S,
Reeves GA, Dallman TJ, Orengo CA.
Philos Trans R Soc Lond B Biol Sci. 2006 Mar 29;361(1467):425-40. Review.
A more recent but difficult paper by this group is:
Towards a comprehensive structural coverage of completed genomes: a structural
genomics viewpoint.
Marsden RL, Lewis TA, Orengo CA.
BMC Bioinformatics. 2007 Mar 9;8:86.
In case you want to learn more about what you can do at the CATH site, here is a recent
reference, but it is likely to be very technical:
The CATH domain structure database: new protocols and classification levels give a
more comprehensive resource for exploring evolution.
Nucleic Acids Res. 2007 Jan;35: D291-7.
Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F,
Nambudiry R, Reid A, Sillitoe I, Yeats C, Thornton JM, Orengo CA.
Here are a couple of articles that sounded interesting:
Protein superfamily evolution and the last universal common ancestor (LUCA).
Ranea JA, Sillero A, Thornton JM, Orengo CA.
J Mol Evol. 2006 Oct;63(4):513-25.
Supra-domains: evolutionary units larger than single protein domains.
Vogel C, Berzuini C, Bashton M, Gough J, Teichmann SA.
J Mol Biol. 2004 Feb 20;336(3):809-23.
Convergent evolution of domain architectures (is rare).
Gough J.
Bioinformatics. 2005 Apr 15;21(8):1464-71. Epub 2004 Dec 7.
5. Go to the 3D viewing sites that were the original
assignment vehicle, and find whether the data on families,
domains, fold classes, etc. has been studied there, and
take the opportunity to view the 3D structure of a few of
the more common motifs (like the “beta-propellor”). After
some exploration, write a paragraph (or more) explaining
whether you have found any way to tie together the two
kinds of sites: The structural categorization places like
CATH, and the 3D viewing like Protein Data Bank. For
example, is the “sequence information” that CATH used to
make their classifications available at PDB?
6. Finally, integrate your notes and ideas into a written
short paper that discusses the nature of folding and its
role in evolution, with a focus on the possibility that it
played a crucial role in the development of the earliest
life. By the time you write this we will have covered some
of the requisite topics.
Here are some links to 3D viewing sites:
Protein Data Bank: http://www.rcsb.org/pdb/home/home.do
While there, take advantage of the past “molecules of
the month” list and read about some unusual biopolymers.
Molecules to Go: http://molbio.info.nih.gov/cgi-bin/pdb
lesSwiss-Prot: http://www.expasy.ch/ (This is actually the
over-site, for the ExPASy Proteomics Server . It is the
proteomics seerver of the Swiss Institute of
Bioinformatics.
A course?
http://swissmodel.expasy.org//course/text/chapter4b.htm
If you think you’ve seen a collection of links before,
take a look at this one:
http://www.expasy.ch/links.html#Proteins
You should be able to find every protein structure group in
the world from here.
http://people.ouc.bc.ca/woodcock/molecule/molecule.html
U. So. Maine tutorial for Deep View-Swiss-PdbViewer, for
the beginning molecular modeler or viewer [Note: this might
only run on OS9]
http://www.usm.maine.edu/~rhodes/SPVTut/index.html
World Index of Molecular Visualization Resources:
http://molvis.sdsc.edu/visres/index.html
related pages:
http://www.molvisions.com/
http://molvis.sdsc.edu/visres/index.html#c-rtu
http://molvis.sdsc.edu/visres/deepview/titles.jsp
While I’m at it, a good list of DNA sites is:
http://molvis.sdsc.edu/dna/moredna.htm
Lehninger 3D Structure Tutorials
http://www.worthpublishers.com/lehninger3d/lold/index.html
Lehninger is the world’s most admired biochemistry textbooks, which most
biochemists and medical students must get through in their first few
semesters. It has a great online viewing/tutorial site (I hear) but it requires
that you download chime software. If you are computer-savy and want to
do this, give it a try.
Download