article

advertisement
LONG ENTRY
The Relationship Between the Divergence of Amino-acid Sequence and Protein
Structure
Arthur M. Lesk
The Pennsylvania State University
Department of Biochemistry and Molecular Biology and the Huck Institute of
Genomics, Proteomics and Bioinformatics
University Park, PA 16801
U.S.A.
Introduction
The changes in protein structure in response to changes in amino acid
sequence show how proteins evolve novel structures and functions, and also
illuminate the mechanism by which amino acid sequence determines protein
structure.
Appreciating the relationship between amino-acid sequence and conformation
over a family of proteins can provide the key to understanding the structure and
function of even individual proteins. For, looking at a single protein, it is often
unclear – except in the cases of obvious catalytic residues in an active site – what
the importance and roles of particular residues are. Nature can do the analysis for
us, by providing patterns of conservation: simplistically, residues important for
structure and function are conserved, those less essential are variable. The
situation illustrates Dobzhansky’s assertion that: ‘Nothing in biology makes sense
except in the light of evolution.’ (Dobzhanzhsky, 1973).
The correlation of changes in sequence, structure and function through
evolution has occupied molecular biologists since the first structures were
determined. When Kendrew, Perutz and colleagues first determined the structures
of sperm whale myoglobin and human haemoglobin (Kendrew, et al. 1958;
Perutz, et al. 1960), they observed that the structures of the haemoglobin subunits
and myoglobin resembled one another. Note that the amino acid sequences of
globins had not yet been determined: Although Sanger had established the amino
acid sequence of insulin in 1951 (Sanger and Tuppy, 1951ab), the complete amino
acid sequence of human haemoglobin was not published until ten years later
1
(Braunitzer et al. 1961). Thus the first comparisons of complete globins in terms
of structure were earlier than comparisons based on sequences.
Much earlier, in fact: Some fifty years before, E.T. Reichert and A.P. Brown
(1909) published a study of crystals of haemoglobin isolated from different
species of fishes. Haemoglobin crystallography, three years before the discovery
of X-ray diffraction, was limited to measuring the angles between crystal faces.
Stenö’s law (1669) states that the interfacial angles of all crystals of a substance are
the same, independent of the size and macroscopic shape of the crystal. Reichert
and Brown found that the patterns of divergence of these angles correlated with
the evolutionary tree of the species of fishes. They also even found differences
between crystals of deoxy- and oxyhaemoglobin. This work has my nomination
for the most premature scientific result ever published.
We can now interpret and appreciate these observations: The formation of
crystals implies that the molecules can take up a definite structure, able to pack
into regular arrays. The differences in interfacial angles imply that crystals of
haemoglobins from different fishes have different structures. The correlation of
the divergence patterns of the crystals and the species implies that evolution is
shaping molecules as well as bodies, in parallel processes. The differences between
crystals of deoxy- and oxyhaemoglobin imply that the protein undergoes a
conformational change upon binding oxygen.
Although complete sequences of haemoglobin and myoglobin were not
available, there was a lot of fragmentary data, on which molecular biologists
jumped with considerable alacrity. Anfinsen's (1959) seminal book dealt with
sequences. Fragmentary globin sequences supported V. Ingram's (1956)
identification of the molecular change identified by Pauling et al. (1949) as a
specific mutation. This was the first demonstration of the clinical relevance of
molecular sequences.
It has now become clear that protein evolution can be understood as the
exploration of sequence space by mutations in DNA, with potential consequent
changes in structure and function. Selection acts on function to close the feedback
loop. Even in the absence of selective pressure, genetic drift can produce changes
in amino acid sequences in populations and species.
Once it was seen that related proteins shared similar structures, the application
to what came to be called homology modelling became clear. The first homology
model was built prior to generally-available computer graphics, using physical
models. Browne et al. (1969) used wire and plastic models to build a model of
baboon lactalbumin from the coordinates of a hen egg-white lysozyme.
2
Homology modelling has become a heavy industry in the field. (See articles in this
volume by Schwede and Marcatili.)
In this article we shall discuss the response of protein structures to amino-acid
sequence changes at the level of individual domains. The article by Jaskolski in
this volume treats domain swapping; that by Garret treats oligomeric proteins.
Another very important aspect of protein evolution, not discussed here, is the
assembly of complex protein structures by concatenation of domains, and its
variation during evolution.
Evolution of the globins
To some extent, the history of the analysis of the changes in structure
produced by changes in amino-acid sequence can be read in terms of the
availability of data on more progressively more widely-divergent sets of
homologues. Early studies emphasized the similarities between related proteins.
For example, Kendrew and Watson (1966) proposed the idea of ‘complementary
mutations’: mutations which preserve the sum of the volumes of the residues in
contact, in order to keep the conformations of monomer constant.
Crystallography had not yet revealed the structural aspects of the allosteric change.
We now recognize that complementary mutations are the exception rather than
the rule.
The problem was that conclusions were drawn on the basis of limited data.
After the work of Perutz, Kendrew and Watson, globin sequences and structures
from non-mammalian species were determined, including invertebrates and even
plants.
On the basis of this wider corpus of structures, Lesk and Chothia (1980)
published an analysis of sequence-structure relationships in globins. They
suggested the following picture of sequence-structure relationships and evolution
in what are now called full-length globins:
1. The principal determinants of the three-dimensional structure of the
full-length globins lie in approximately 60 residues that are involved in the
packing of helices and in the interactions between the helices and the haem
group. In globins, most buried residues appear at interfaces between helices.
2. Although mutations of the buried residues keep the sidechains
nonpolar, the sidechains vary in size. The complementarity asserted by
Kendrew and Watson is not generally observed.
3. In response to mutations at helix-helix interfaces, the tertiary structure
makes adjustments. Shifts in relative position and orientation of
3
homologous pairs of packed helices may be as much as 7 Å and 30. This
contrasts with the Kendrew-Watson idea of structure fundamentally
invariant in evolution.
4. Despite the large changes in the relative positions and orientations of
the helices, a subtle feature of the structure is preserved. Although changes
in volume of residues at helix-helix interfaces cause shifts and rotations of
the helices, there is substantial conservation of the reticulation of the residues;
that is, homologous residues tend to make homologous contacts: The pattern
of residue-residue contacts at interfaces tends to remain, even if the residues that make the
contact mutate. That is, if a pair of residues is in contact in one structure, the
homologous residues in a related protein are likely to make a contact also.
This conservation of the contact pattern of the residues arises from the
requirement for maintaining well-packed interfaces during evolution, as a
condition for retaining stability. Many point mutations in a helix-helix
interface can be accommodated in the packing by a wriggling around of
sidechains, and by the shifts and rotations of the helices with respect to each
other. But if the reticulation were not conserved – if the interface were to
jump to a completely new set of residue-residue contacts – the
complementarity of the surfaces would be entirely destroyed, and the
stability of the contact lost.
This also explains why insertions and deletions of amino acids do not
appear in packed helices. Insertion of an amino acid would turn at least part
of the interface by 100, destroying complementarity.
The observation that the conservation of reticulation (later shown to
characterize other protein families as well as the globins) is perhaps the
property most widely conserved among distant homologues is the basis of
the program DALI (Holm and Sander, 1993), one of the best programs for
comparison and alignment of protein structures.
5. How do the globins reconcile the large changes in geometry of
individual helix-helix contacts to preserve function? Despite the large
changes in the relative positions and orientations of the helices, the
structures of the haem pockets are very similar. The shifts in the helix
packings produced by mutations are coupled to maintain the relative
geometry of the residues that form the haem pocket It it not uncommon
4
among homologous proteins for the active site to be more tightly conserved
than other parts of the structure.
Lesk and Chothia might conceivably have been excused had they felt a degree
of somewhat smug pride (never openly expressed of course) that they had
corrected the mistakes of giants Kendrew and Watson, and that as a result of their
work the problem of the globin structure was done and dusted. If they had, then
imagine their corresponding chagrin, when the structures of a class of shortened
globin structures appeared. These may be as small as 116 residues.
Truncated globins
Truncated globins are short proteins, occurring in prokaryotes and eukaryotes,
that maintain a recognizable globin fold despite containing only 120 residues,
substantially smaller than the 150 residues of typical full-length globins. They
have been implicated in diverse functions, including detoxification of NO, and
photosynthesis.
Which residues of full-length globins, and which structural elements, are
sacrificed? Truncated globins retain most but not all of the helices of the standard
globin fold with the notable exception of the loss of the F helix, which contains
the iron-linked histidine. They show a shortening of the A helix and of the CD
region. Of the 59 sites involved in conserved helix-to-helix or helix-to-haem
contacts in full-length globins, 41 of them appear, with conserved contacts, in
truncated globins.
Structural relationships among homologous domains
The days are long gone since globins were the only protein family for which
structures from a wide variety of species were available.
Included in the 80000 protein structures now known are many families in
which the molecules maintain the same basic folding pattern over ranges of
sequence similarity from near-identity down to well below 20% conservation,
even below the so-called ‘twilight zone.’
Papain and selected homologues provide illustrative examples. Figure 1 shows
sequence alignments and superpositions of papaya papain, and four homologues the close relative, kiwi fruit actinidin, and increasingly more distant relatives:
human procathepsin L, human cathepsin B, and Staphylococcus aureus Staphopain.
The more distant the relationship, the lower the similarity in both sequence and
structure. This series of alignments and superpositions shows the progressive
divergence of the sequences and structures.
5
9pap IPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNQYSEQEL
| ||||| |||
| || || |||||| | ||| || | |
|||||
2act LPSYVDWRSAGAVVDIKSQGECGGCWAFSAIATVEGINKITSGSLISLSEQEL
9pap LDCDRR--SYGCNGGYPWSALQLVAQY-GIHYRNTYPYEGVQRYCRSR-EKGP
|| |
|| |||
|
||
|||
|
2act IDCGRTQNTRGCDGGYITDGFQFIINDGGINTEENYPYTAQDGDCD--VAL-9pap ---YAAKTDGVRQVQPYNQGALLYSIANQPVSVVLQAAGKDFQLYRGGIFVGP
|
|
| ||
||||| | ||| | | ||| ||
2act QDQKYVTIDTYENVPYNNEWALQTAVTYQPVSVALDAAGDAFKQYASGIFTGP
9pap CGNKVDHAVAAVGYGP-----NYILIKNSWGTGWGENGYIRIKRGTGNSYG
|| ||||
||||
|
|||| | ||| || || |
|
2act CGTAVDHAIVIVGYG-TEGGVDYWIVKNSWDTTWGEEGYMRILRNV-GGAG
Figure 1a
9pap IPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNQYSEQEL
|||| || |||||||| ||| |||||
||
|| |
||| |
1cjl ----VDWREKGYVTPVKNQGQCGSSWAFSATGALEGQMFRKTGRLISLSEQNL
9pap LDCDR--RSYGCNGGYPWSALQLVAQY-GIHYRNTYPYEGVQRYCRSREKGP||
|||||
| | |
|
||||
|
|
1cjl VDCSGPEGNEGCNGGLMDYAFQYVQDNGGLDSEESYPYEATEESCKY--N-PK
9pap -YAAK--TDGVRQVQPYNQGALLYSIA-NQPVSVVLQAAGKDFQLYRGGIFVG
|
||
|
| ||
|
| | ||
1cjl YS-VANDA-GFVDIPK-QEKALMKAVATVGPISVAIDAGHESFLFYKEGIYFE
9pap P-CGN--KVDHAVAAVGYGPNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCG
| |
|| | ||||
1cjl PDC-SSEDMDHGVLVVGYG---------------------------------9pap LYTSSFYPVKN--1cjl -----------FES
Figure 1b
9pap IPEYVDWRQ-KG-A--VT-PVKNQGSCGSCWAFSAVVTIEGIIKIRTG-NLNQY
6
|
| |
|||||||||| || |
| | |
1huc LPASFDAREQWPQCPTI-KEIRDQGSCGSCWAFGAVEAISDRICIHT-NVSVEV
9pap SEQELLDCDR-R-SYGCNGGYPWSALQLVAQYGIHYR-------NTYPYEGV-|
|| |
||||||| |
|
||
1huc SAEDLLTCCGSMCGDGCNGGYPAEAWNFWTRKGLVSGGLYESHVGCRPYSI-PP
9pap -----------------Q-RYCRSRE--------KGP-YAAKTDGVRQVQPYNQ
|
|
1huc CEHHVNGSRPPCTGEGDTPK-CSK-ICEPGYSPTYKQDK-HYGYNSYSVSN-SE
9pap GALLYSIAN-QPVSVVLQAAGKDFQLYRGGIFVGPCGNKV------DHAVAAVG
|
||
|| || |
||
|
1huc KDIMAEIYKNGPVEGAFSV-YSDFLLYKSGVYQHV-----TGEMMGGHAIRILG
9pap YGP----NYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN--|
| | ||| | || ||
| ||
||
1huc WGVENGTPYWLVANSWNTDWGDNGFFKILRG--Q--DHCGIESEVVAGIP-RTD
Figure 1c
9pap IPE----YVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNQ-YS
|| | |
|
1cv8 ---NEQYVNKL--E-NFKIRETQGNNGWCAGYTMSALLNATYN-----T-NKYH
9pap EQELLDCDRRSYG---------CNGGYPW-S---ALQLVAQYGIHYRNTYPYEG
|
|
1cv8 AEAVMRFLH----PNLQGQQFQFTGLT-PREMIYFGQ--T-------------9pap VQRYCRSREKGPYAAKTDGVRQVQP-Y-NQ--GALL-YSIA-NQ-PVSVVLQAA
|
|
1cv8 ---------------QG-RSPQL-LNRMTTYNE--VDNL-TKNNKGIAILG--9pap GKDFQLYRGGIFVGPCGNKV-----------DHAVAAVGYGP-----NYILIKN
|| | ||
| | |
1cv8 --------------------SRVESRNGMHAGHAMAVVGNAKLNNGQEVIIIWN
9pap SWGTGWGENGYIRIKRGTGNSYG-VCGLY-------------TSSFYPVKN
|
|
1cv8 PWDN----G-FMTQDA-K-----NN----VIPVSNGDHYQWYSSIYGY---
Figure 1d
Figure 1e
7
How are changes in sequence and structure measured?
Comparisons of protein sequences and structures depend crucially on
alignment – the assignment of residue-residue correspondences. For pairs of
sequences, the algorithm of Needleman and Wunsch (1970), based on dynamic
programming, determines an optimal alignment. This optimum depends on a set
of scoring parameters, in which the probability of amino-acid substitution is most
often based on statistics of substitutions in easily-alignable pairs of sequences
(originally by M.O. Dayhoff et al. (1978) and subsequently by Henikoff and
Henikoff, (1992)), and on a suitable set of gap penalties. Weaknesses of pairwise
sequence alignment include: (1) Although current algorithms guarantee finding an
optimal solution, the solution may not in general be unique, and the optimal
solution may depend in non-trivial ways on small changes in scoring parameters,
and (2) pairwise sequence alignments can not be relied on to give the correct
answer for highly-diverged sequences. To some extent, multiple sequence
alignment can overcome these limitations.
Nevertheless, the ‘court of last resort’ is structural alignment. Structural
alignment, like any other alignment method, has as its result an assignment of
residue-residue correspondences. However, in structure alignment, the
correspondence is determined from the relative disposition of atoms in space. The
measure of the structural similarity of the aligned residues is given as the rootmean-square deviation (a measure of the minimized average distance between
corresponding atoms.)
There are many algorithms for pairwise structural alignment, and for the
generalization of the problem to multiple structural alignment. Nevertheless, a
general problem in structural alignment is how large a substructure to select. One
may be able to choose 100-atom substructures with r.m.s.d. 1.5 Å, or 150-atom
substructures with r.m.s.d. 2.0 Å, or 200-atom substructures with r.m.s.d. 3.5 Å.
Which is preferred? Often, there is an obvious core of the structure that fits much
better than the rest. In such cases one can easily make intelligent choices.
8
Comparisons of homologous proteins reveal how a family of
structures accommodates changes in amino acid sequence.
1. Surface residues not involved in function are usually free to mutate,
although the sickle-cell example shows that some adult supervision remains
necessary. Loops on the surface can often accommodate changes by local
refolding.
2. Related structures retain most elements of secondary structure: the
helices and strands of sheet, although their lengths may vary.
3. The core of the structure -- the assembly of the central helices and/or
sheets - usually retains its topology or folding pattern. For closely-related
proteins the core comprises almost the entire structure. For distantly-related
proteins, the core may contain only half the residues, or even fewer.
4. Peripheral regions, outside the core, may change their folding pattern
entirely. These regions cannot be aligned. Indeed, structural alignment,
unlike pairwise sequence alignment, can distinguish between alignable regions
of the sequences - those that correspond to reasonably-superposable
structures, and non-alignable regions - those that do not.
5. The relative geometry of the secondary structures, even in the core, is
variable. As a result of mutations, helices and sheets can shift and rotate
with respect to one another. Mutations that change the volumes of buried
residues generally do not change the local conformations of individual
helices or sheets, but distort their spatial assembly by perturbing the packing
at their interfaces. The nature of the forces that stabilize protein structures
sets general limitations on these conformational changes; particular
constraints derived from function vary from case to case.
6. For evolution with retention of function, the structural changes are
subject to global constraints that conserve function, for example, to
maintain the integrity of the active site. For evolution with change in
function, these constraints are replaced by other constraints required by the
altered function, producing greater structural change.
There is a quantitative relationship between the divergence of
the amino acid sequences of the core of a family of proteins,
and the divergence of the structures.
As the sequence diverges, there are progressively increasing distortions in the
mainchain conformation, and the fraction of the residues in the core usually
9
decreases (Chothia & Lesk, 1986). Until the fraction of identical residues in the
sequence drops below about 40-50%, these effects are relatively modest. Almost
all the structure remains in the core, and the deformation of the mainchain atoms
is on the average no more than about 1.0 Å. With increasing sequence divergence,
some regions refold entirely, reducing the size of the core, and the distortions of
the remaining core residues increase in magnitude. In some cases the changes in
structure and sequence are more extreme. Figure 1 showed such progressive
changes.
A correlation between the divergence of sequence and structure applies to all
families of globular proteins. Figure 2a shows the changes in structure of the core,
expressed as the root-mean-square deviation of the backbone atoms after optimal
superposition, plotted against the sequence divergence: the % conserved amino
acids of the core after optimal alignment. The points correspond to pairs of
homologous proteins from many related families. (Those at 100% residue identity
are proteins for which the structure was determined in two or more crystal
environments. The deviations show that crystal packing forces – and, to a lesser
extent, solvent and temperature – can modify slightly the conformation of the
proteins.) Figure 2b shows the changes in the fraction of residues in the core as a
function of sequence divergence. The fraction of residues in the cores of distantly
related proteins can vary widely: in some cases the fraction of residues in the core
remains high, in others it can drop to below 50% of the structure.
Figure 2.
How far can evolution go?
Although considerable evidence supports the idea that the basic common core
fold of a protein remains intact during evolution, this statement depends on the
horizon within which we can confidently anticipate and identify homologues. The
truncated globins are one cautionary example, as is the difficulty of deciding
definitively whether globins and phycocyanins are homologues or examples of
convergent evolution (Pastore & Lesk, 1990).
N.V. Grishin (2001) and, subsequently, others, have suggested that the
situation may be even worse. The C-terminal domain of catabolite gene activator
protein (CAP) is an all-helical protein. The HIN recombinase DNA-binding
protein contains a sheet. Looking only at these two structures one would
unhesitatingly conclude that their folding patterns were markedly different, and
that they were not homologous. However, Grishin adduced potential
intermediates: the N-terminal domain of biotin repressor and the C-terminal
10
domain of ribosomal protein L11 (see Figure 3.) Successive pairs in this series –
CAP C-terminal domain, biotin repressor N-terminal domain, L11 C-terminal
domain, and HIN recombinase – show not entirely dissimilar folding patterns. It
is not possible easily to rule out that the possibility of an evolutionary pathway
between CAP and HIN recombinase (even if biotin repressor and L11 do not
themselves lie on it); nor, conversely, to conclude that these four structures are
homologues.
Figure 3.
Keyword
protein structure, amino-acid sequence, alignment, evolution,
structural core
Summary
tend to retain common folding patterns. However, although the general folding
pattern is preserved, there are distortions which increase as the amino acid
sequences progressively diverge. These distortions are not uniformly
distributed throughout the structure. A large central core of the structure
usually retains the same qualititative fold, and other parts of the structure
change conformation more radically. As homologous proteins diverge, these
changes in structure give rise to changes in function, including regulation.
Cross-References
→ Macromolecular crystallography
→ Structural impact of SNPs
→ Structure comparison methods
→ Structural genomics
→ The evolution of protein structures
11
FIGURE CAPTIONS
Figure 1. (a) Sequence alignment and structural superposition of papaya papain
[wwPDB entry: 9PAP] and kiwi fruit actinidin [2ACT]. (b) Sequence alignment and
structural superposition of papaya papain [9PAP] and human procathepsin L
[1CJL]. (c) Sequence alignment and structural superposition of papaya papain
[9PAP] and human liver cathepsin B [1HUC]. Note, in both the sequence alignment
and the superposition, the higher similarity at the beginning and end of the
sequences than in the middle region. (d) Sequence alignment and structural
superposition of papaya papain [9PAP] and S. aureus staphopain [1CV8]. (e)
Papaya papain, to provide a key to navigate between the sequence alignments
and structure superpositions.
Figure 2. Relationships between divergence of amino acid sequence and threedimensional structure of the core, in evolving proteins. (a) Variation of r.m.s.
deviation of the core with the percent identical residues in the core. (b)
Variation of size of the core with the percent identical residues in the core.
This figure shows results calculated for 32 pairs of homologous proteins of a
variety of structural types. (Adapted from Chothia and Lesk (1986).)
Figure 3. Four proteins with different assemblies of secondary structure elements.
Could there be an evolutionary pathway between such divergent structures? (a)
C-terminal domain of catabolite gene activator protein (CAP) [1CGP]. (b) HIN
recombinase DNA-binding protein [1HCR]. (c) N-terminal domain of biotin
repressor [1BIA]. (d) C-terminal domain of ribosomal protein L11 [1FOW].
References [max. 20]
Anfinsen, CB. (1959). The molecular basis of evolution. Wiley: New York
Braunitzer G, Gehring-Mueller R, Hilschman N, Hilse K, Hobom G, Rudolff V,
and Wittmann-Liebold B (1961) Die Konstitution des normalen adulten
Humanhämoglobins Z Physiol Chem, 325: 283-286.
12
Browne, WJ, North, AC, Phillips, DC, Brew, K, Vanaman, TC, and Hill, RL
(1969). A possible three-dimensional structure of bovine lactalbumin based
on that of hen's egg-white lysozyme. J Mol Biol 42: 65-86.
Chothia C,, Lesk A. (1986). Relationship between the divergence of sequence and
structure in proteins The EMBO Journal 5: 823-826.
Dayhoff MO, Schwartz R and Orcutt BC (1978). A Model of Evolutionary
Change in Proteins. In: Atlas of protein sequence and structure (volume 5,
supplement 3 ed.). Nat. Biomed. Res. Found. pp. 345-358.
Dobzhansky, T. (1973). Nothing in biology makes sense except in the light of
evolution. Amer. Biol. Teacher 35: 125-129.
Grishin, NV (2001). Fold change in evolution of protein structures. J. Struc. Biol.
134, 167-185.
Henikoff S, Henikoff, JG. (1992). Amino acid substitution matrices from protein
blocks. Proc. Nat. Acad. Sci. USA 89: 10915-10919.
Holm L, Sander C (1993) Protein structure comparison by alignment of distance
matrices, J Mol Biol 233:123-38.
Ingram VM (1956) A specific chemical difference between the globins of normal
human and sickle-cell anaemia haemoglobin. Nature 178: 792-794.
Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC (1958).
A three-dimensional model of the myoglobin molecule obtained by x-ray
analysis. Nature 181: 662-636.
Kendrew, JC, Watson, H.C. (1966). Stabilizing interactions in globular proteins.
In: Ciba Foundation Symposium - Principles of Biomolecular Organization.
Eds: Wolstenholme GEW, O'Connor M. Ciba Foundation, London.
Lesk, AM, Chothia, C. (1980). How different amino acid sequences determine
similar protein structures: The structure and evolutionary dynamics of the
globins J Mol Biol 136: 225-270.
Needleman SB and Wunsch, CD. (1970). A general method applicable to the
search for similarities in the amino acid sequence of two proteins. J Mol Biol
48: 443-453.
Pastore A, Lesk AM. (1990). Comparison of the structures of globins and
phycocyanins: evidence for evolutionary relationship Proteins: Structure,
Function and Genetics 8: 133-155.
Pauling L, Itano HA, Singer SJ, Wells IC (1949). Sickle cell anemia, a molecular
disease. Science 110: 543-548.
13
Perutz MF, Rossman MG, Cullis AF, Muirhead H, Will G, North ACT (1960).
Structure of haemoglobin: a three-dimensional Fourier synthesis at 55Å
resolution, obtained by X-ray analysis Nature, 185: 416-422.
Perutz MF, Kendrew JC, Watson HC (1965). Structure and function of
haemoglobin: II. Some relations between polypeptide chain configuration and
amino acid sequence. J Mol Biol 13: 669-678.
Sanger F, Tuppy H (1951a) The amino-acid sequence in the phenylalanyl chain of
insulin. 1. The identification of lower peptides from partial hydrolysates
Biochem J 49: 463-481.
Sanger F, Tuppy H (1951a) The amino-acid sequence in the phenylalanyl chain of
insulin. 2. The investigation of peptides from enzymic hydrolysates, Biochem J
49: 481-490.
14
Download