LONG ENTRY The Relationship Between the Divergence of Amino-acid Sequence and Protein Structure Arthur M. Lesk The Pennsylvania State University Department of Biochemistry and Molecular Biology and the Huck Institute of Genomics, Proteomics and Bioinformatics University Park, PA 16801 U.S.A. Introduction The changes in protein structure in response to changes in amino acid sequence show how proteins evolve novel structures and functions, and also illuminate the mechanism by which amino acid sequence determines protein structure. Appreciating the relationship between amino-acid sequence and conformation over a family of proteins can provide the key to understanding the structure and function of even individual proteins. For, looking at a single protein, it is often unclear – except in the cases of obvious catalytic residues in an active site – what the importance and roles of particular residues are. Nature can do the analysis for us, by providing patterns of conservation: simplistically, residues important for structure and function are conserved, those less essential are variable. The situation illustrates Dobzhansky’s assertion that: ‘Nothing in biology makes sense except in the light of evolution.’ (Dobzhanzhsky, 1973). The correlation of changes in sequence, structure and function through evolution has occupied molecular biologists since the first structures were determined. When Kendrew, Perutz and colleagues first determined the structures of sperm whale myoglobin and human haemoglobin (Kendrew, et al. 1958; Perutz, et al. 1960), they observed that the structures of the haemoglobin subunits and myoglobin resembled one another. Note that the amino acid sequences of globins had not yet been determined: Although Sanger had established the amino acid sequence of insulin in 1951 (Sanger and Tuppy, 1951ab), the complete amino acid sequence of human haemoglobin was not published until ten years later 1 (Braunitzer et al. 1961). Thus the first comparisons of complete globins in terms of structure were earlier than comparisons based on sequences. Much earlier, in fact: Some fifty years before, E.T. Reichert and A.P. Brown (1909) published a study of crystals of haemoglobin isolated from different species of fishes. Haemoglobin crystallography, three years before the discovery of X-ray diffraction, was limited to measuring the angles between crystal faces. Stenö’s law (1669) states that the interfacial angles of all crystals of a substance are the same, independent of the size and macroscopic shape of the crystal. Reichert and Brown found that the patterns of divergence of these angles correlated with the evolutionary tree of the species of fishes. They also even found differences between crystals of deoxy- and oxyhaemoglobin. This work has my nomination for the most premature scientific result ever published. We can now interpret and appreciate these observations: The formation of crystals implies that the molecules can take up a definite structure, able to pack into regular arrays. The differences in interfacial angles imply that crystals of haemoglobins from different fishes have different structures. The correlation of the divergence patterns of the crystals and the species implies that evolution is shaping molecules as well as bodies, in parallel processes. The differences between crystals of deoxy- and oxyhaemoglobin imply that the protein undergoes a conformational change upon binding oxygen. Although complete sequences of haemoglobin and myoglobin were not available, there was a lot of fragmentary data, on which molecular biologists jumped with considerable alacrity. Anfinsen's (1959) seminal book dealt with sequences. Fragmentary globin sequences supported V. Ingram's (1956) identification of the molecular change identified by Pauling et al. (1949) as a specific mutation. This was the first demonstration of the clinical relevance of molecular sequences. It has now become clear that protein evolution can be understood as the exploration of sequence space by mutations in DNA, with potential consequent changes in structure and function. Selection acts on function to close the feedback loop. Even in the absence of selective pressure, genetic drift can produce changes in amino acid sequences in populations and species. Once it was seen that related proteins shared similar structures, the application to what came to be called homology modelling became clear. The first homology model was built prior to generally-available computer graphics, using physical models. Browne et al. (1969) used wire and plastic models to build a model of baboon lactalbumin from the coordinates of a hen egg-white lysozyme. 2 Homology modelling has become a heavy industry in the field. (See articles in this volume by Schwede and Marcatili.) In this article we shall discuss the response of protein structures to amino-acid sequence changes at the level of individual domains. The article by Jaskolski in this volume treats domain swapping; that by Garret treats oligomeric proteins. Another very important aspect of protein evolution, not discussed here, is the assembly of complex protein structures by concatenation of domains, and its variation during evolution. Evolution of the globins To some extent, the history of the analysis of the changes in structure produced by changes in amino-acid sequence can be read in terms of the availability of data on more progressively more widely-divergent sets of homologues. Early studies emphasized the similarities between related proteins. For example, Kendrew and Watson (1966) proposed the idea of ‘complementary mutations’: mutations which preserve the sum of the volumes of the residues in contact, in order to keep the conformations of monomer constant. Crystallography had not yet revealed the structural aspects of the allosteric change. We now recognize that complementary mutations are the exception rather than the rule. The problem was that conclusions were drawn on the basis of limited data. After the work of Perutz, Kendrew and Watson, globin sequences and structures from non-mammalian species were determined, including invertebrates and even plants. On the basis of this wider corpus of structures, Lesk and Chothia (1980) published an analysis of sequence-structure relationships in globins. They suggested the following picture of sequence-structure relationships and evolution in what are now called full-length globins: 1. The principal determinants of the three-dimensional structure of the full-length globins lie in approximately 60 residues that are involved in the packing of helices and in the interactions between the helices and the haem group. In globins, most buried residues appear at interfaces between helices. 2. Although mutations of the buried residues keep the sidechains nonpolar, the sidechains vary in size. The complementarity asserted by Kendrew and Watson is not generally observed. 3. In response to mutations at helix-helix interfaces, the tertiary structure makes adjustments. Shifts in relative position and orientation of 3 homologous pairs of packed helices may be as much as 7 Å and 30. This contrasts with the Kendrew-Watson idea of structure fundamentally invariant in evolution. 4. Despite the large changes in the relative positions and orientations of the helices, a subtle feature of the structure is preserved. Although changes in volume of residues at helix-helix interfaces cause shifts and rotations of the helices, there is substantial conservation of the reticulation of the residues; that is, homologous residues tend to make homologous contacts: The pattern of residue-residue contacts at interfaces tends to remain, even if the residues that make the contact mutate. That is, if a pair of residues is in contact in one structure, the homologous residues in a related protein are likely to make a contact also. This conservation of the contact pattern of the residues arises from the requirement for maintaining well-packed interfaces during evolution, as a condition for retaining stability. Many point mutations in a helix-helix interface can be accommodated in the packing by a wriggling around of sidechains, and by the shifts and rotations of the helices with respect to each other. But if the reticulation were not conserved – if the interface were to jump to a completely new set of residue-residue contacts – the complementarity of the surfaces would be entirely destroyed, and the stability of the contact lost. This also explains why insertions and deletions of amino acids do not appear in packed helices. Insertion of an amino acid would turn at least part of the interface by 100, destroying complementarity. The observation that the conservation of reticulation (later shown to characterize other protein families as well as the globins) is perhaps the property most widely conserved among distant homologues is the basis of the program DALI (Holm and Sander, 1993), one of the best programs for comparison and alignment of protein structures. 5. How do the globins reconcile the large changes in geometry of individual helix-helix contacts to preserve function? Despite the large changes in the relative positions and orientations of the helices, the structures of the haem pockets are very similar. The shifts in the helix packings produced by mutations are coupled to maintain the relative geometry of the residues that form the haem pocket It it not uncommon 4 among homologous proteins for the active site to be more tightly conserved than other parts of the structure. Lesk and Chothia might conceivably have been excused had they felt a degree of somewhat smug pride (never openly expressed of course) that they had corrected the mistakes of giants Kendrew and Watson, and that as a result of their work the problem of the globin structure was done and dusted. If they had, then imagine their corresponding chagrin, when the structures of a class of shortened globin structures appeared. These may be as small as 116 residues. Truncated globins Truncated globins are short proteins, occurring in prokaryotes and eukaryotes, that maintain a recognizable globin fold despite containing only 120 residues, substantially smaller than the 150 residues of typical full-length globins. They have been implicated in diverse functions, including detoxification of NO, and photosynthesis. Which residues of full-length globins, and which structural elements, are sacrificed? Truncated globins retain most but not all of the helices of the standard globin fold with the notable exception of the loss of the F helix, which contains the iron-linked histidine. They show a shortening of the A helix and of the CD region. Of the 59 sites involved in conserved helix-to-helix or helix-to-haem contacts in full-length globins, 41 of them appear, with conserved contacts, in truncated globins. Structural relationships among homologous domains The days are long gone since globins were the only protein family for which structures from a wide variety of species were available. Included in the 80000 protein structures now known are many families in which the molecules maintain the same basic folding pattern over ranges of sequence similarity from near-identity down to well below 20% conservation, even below the so-called ‘twilight zone.’ Papain and selected homologues provide illustrative examples. Figure 1 shows sequence alignments and superpositions of papaya papain, and four homologues the close relative, kiwi fruit actinidin, and increasingly more distant relatives: human procathepsin L, human cathepsin B, and Staphylococcus aureus Staphopain. The more distant the relationship, the lower the similarity in both sequence and structure. This series of alignments and superpositions shows the progressive divergence of the sequences and structures. 5 9pap IPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNQYSEQEL | ||||| ||| | || || |||||| | ||| || | | ||||| 2act LPSYVDWRSAGAVVDIKSQGECGGCWAFSAIATVEGINKITSGSLISLSEQEL 9pap LDCDRR--SYGCNGGYPWSALQLVAQY-GIHYRNTYPYEGVQRYCRSR-EKGP || | || ||| | || ||| | 2act IDCGRTQNTRGCDGGYITDGFQFIINDGGINTEENYPYTAQDGDCD--VAL-9pap ---YAAKTDGVRQVQPYNQGALLYSIANQPVSVVLQAAGKDFQLYRGGIFVGP | | | || ||||| | ||| | | ||| || 2act QDQKYVTIDTYENVPYNNEWALQTAVTYQPVSVALDAAGDAFKQYASGIFTGP 9pap CGNKVDHAVAAVGYGP-----NYILIKNSWGTGWGENGYIRIKRGTGNSYG || |||| |||| | |||| | ||| || || | | 2act CGTAVDHAIVIVGYG-TEGGVDYWIVKNSWDTTWGEEGYMRILRNV-GGAG Figure 1a 9pap IPEYVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNQYSEQEL |||| || |||||||| ||| ||||| || || | ||| | 1cjl ----VDWREKGYVTPVKNQGQCGSSWAFSATGALEGQMFRKTGRLISLSEQNL 9pap LDCDR--RSYGCNGGYPWSALQLVAQY-GIHYRNTYPYEGVQRYCRSREKGP|| ||||| | | | | |||| | | 1cjl VDCSGPEGNEGCNGGLMDYAFQYVQDNGGLDSEESYPYEATEESCKY--N-PK 9pap -YAAK--TDGVRQVQPYNQGALLYSIA-NQPVSVVLQAAGKDFQLYRGGIFVG | || | | || | | | || 1cjl YS-VANDA-GFVDIPK-QEKALMKAVATVGPISVAIDAGHESFLFYKEGIYFE 9pap P-CGN--KVDHAVAAVGYGPNYILIKNSWGTGWGENGYIRIKRGTGNSYGVCG | | || | |||| 1cjl PDC-SSEDMDHGVLVVGYG---------------------------------9pap LYTSSFYPVKN--1cjl -----------FES Figure 1b 9pap IPEYVDWRQ-KG-A--VT-PVKNQGSCGSCWAFSAVVTIEGIIKIRTG-NLNQY 6 | | | |||||||||| || | | | | 1huc LPASFDAREQWPQCPTI-KEIRDQGSCGSCWAFGAVEAISDRICIHT-NVSVEV 9pap SEQELLDCDR-R-SYGCNGGYPWSALQLVAQYGIHYR-------NTYPYEGV-| || | ||||||| | | || 1huc SAEDLLTCCGSMCGDGCNGGYPAEAWNFWTRKGLVSGGLYESHVGCRPYSI-PP 9pap -----------------Q-RYCRSRE--------KGP-YAAKTDGVRQVQPYNQ | | 1huc CEHHVNGSRPPCTGEGDTPK-CSK-ICEPGYSPTYKQDK-HYGYNSYSVSN-SE 9pap GALLYSIAN-QPVSVVLQAAGKDFQLYRGGIFVGPCGNKV------DHAVAAVG | || || || | || | 1huc KDIMAEIYKNGPVEGAFSV-YSDFLLYKSGVYQHV-----TGEMMGGHAIRILG 9pap YGP----NYILIKNSWGTGWGENGYIRIKRGTGNSYGVCGLYTSSFYPVKN--| | | ||| | || || | || || 1huc WGVENGTPYWLVANSWNTDWGDNGFFKILRG--Q--DHCGIESEVVAGIP-RTD Figure 1c 9pap IPE----YVDWRQKGAVTPVKNQGSCGSCWAFSAVVTIEGIIKIRTGNLNQ-YS || | | | 1cv8 ---NEQYVNKL--E-NFKIRETQGNNGWCAGYTMSALLNATYN-----T-NKYH 9pap EQELLDCDRRSYG---------CNGGYPW-S---ALQLVAQYGIHYRNTYPYEG | | 1cv8 AEAVMRFLH----PNLQGQQFQFTGLT-PREMIYFGQ--T-------------9pap VQRYCRSREKGPYAAKTDGVRQVQP-Y-NQ--GALL-YSIA-NQ-PVSVVLQAA | | 1cv8 ---------------QG-RSPQL-LNRMTTYNE--VDNL-TKNNKGIAILG--9pap GKDFQLYRGGIFVGPCGNKV-----------DHAVAAVGYGP-----NYILIKN || | || | | | 1cv8 --------------------SRVESRNGMHAGHAMAVVGNAKLNNGQEVIIIWN 9pap SWGTGWGENGYIRIKRGTGNSYG-VCGLY-------------TSSFYPVKN | | 1cv8 PWDN----G-FMTQDA-K-----NN----VIPVSNGDHYQWYSSIYGY--- Figure 1d Figure 1e 7 How are changes in sequence and structure measured? Comparisons of protein sequences and structures depend crucially on alignment – the assignment of residue-residue correspondences. For pairs of sequences, the algorithm of Needleman and Wunsch (1970), based on dynamic programming, determines an optimal alignment. This optimum depends on a set of scoring parameters, in which the probability of amino-acid substitution is most often based on statistics of substitutions in easily-alignable pairs of sequences (originally by M.O. Dayhoff et al. (1978) and subsequently by Henikoff and Henikoff, (1992)), and on a suitable set of gap penalties. Weaknesses of pairwise sequence alignment include: (1) Although current algorithms guarantee finding an optimal solution, the solution may not in general be unique, and the optimal solution may depend in non-trivial ways on small changes in scoring parameters, and (2) pairwise sequence alignments can not be relied on to give the correct answer for highly-diverged sequences. To some extent, multiple sequence alignment can overcome these limitations. Nevertheless, the ‘court of last resort’ is structural alignment. Structural alignment, like any other alignment method, has as its result an assignment of residue-residue correspondences. However, in structure alignment, the correspondence is determined from the relative disposition of atoms in space. The measure of the structural similarity of the aligned residues is given as the rootmean-square deviation (a measure of the minimized average distance between corresponding atoms.) There are many algorithms for pairwise structural alignment, and for the generalization of the problem to multiple structural alignment. Nevertheless, a general problem in structural alignment is how large a substructure to select. One may be able to choose 100-atom substructures with r.m.s.d. 1.5 Å, or 150-atom substructures with r.m.s.d. 2.0 Å, or 200-atom substructures with r.m.s.d. 3.5 Å. Which is preferred? Often, there is an obvious core of the structure that fits much better than the rest. In such cases one can easily make intelligent choices. 8 Comparisons of homologous proteins reveal how a family of structures accommodates changes in amino acid sequence. 1. Surface residues not involved in function are usually free to mutate, although the sickle-cell example shows that some adult supervision remains necessary. Loops on the surface can often accommodate changes by local refolding. 2. Related structures retain most elements of secondary structure: the helices and strands of sheet, although their lengths may vary. 3. The core of the structure -- the assembly of the central helices and/or sheets - usually retains its topology or folding pattern. For closely-related proteins the core comprises almost the entire structure. For distantly-related proteins, the core may contain only half the residues, or even fewer. 4. Peripheral regions, outside the core, may change their folding pattern entirely. These regions cannot be aligned. Indeed, structural alignment, unlike pairwise sequence alignment, can distinguish between alignable regions of the sequences - those that correspond to reasonably-superposable structures, and non-alignable regions - those that do not. 5. The relative geometry of the secondary structures, even in the core, is variable. As a result of mutations, helices and sheets can shift and rotate with respect to one another. Mutations that change the volumes of buried residues generally do not change the local conformations of individual helices or sheets, but distort their spatial assembly by perturbing the packing at their interfaces. The nature of the forces that stabilize protein structures sets general limitations on these conformational changes; particular constraints derived from function vary from case to case. 6. For evolution with retention of function, the structural changes are subject to global constraints that conserve function, for example, to maintain the integrity of the active site. For evolution with change in function, these constraints are replaced by other constraints required by the altered function, producing greater structural change. There is a quantitative relationship between the divergence of the amino acid sequences of the core of a family of proteins, and the divergence of the structures. As the sequence diverges, there are progressively increasing distortions in the mainchain conformation, and the fraction of the residues in the core usually 9 decreases (Chothia & Lesk, 1986). Until the fraction of identical residues in the sequence drops below about 40-50%, these effects are relatively modest. Almost all the structure remains in the core, and the deformation of the mainchain atoms is on the average no more than about 1.0 Å. With increasing sequence divergence, some regions refold entirely, reducing the size of the core, and the distortions of the remaining core residues increase in magnitude. In some cases the changes in structure and sequence are more extreme. Figure 1 showed such progressive changes. A correlation between the divergence of sequence and structure applies to all families of globular proteins. Figure 2a shows the changes in structure of the core, expressed as the root-mean-square deviation of the backbone atoms after optimal superposition, plotted against the sequence divergence: the % conserved amino acids of the core after optimal alignment. The points correspond to pairs of homologous proteins from many related families. (Those at 100% residue identity are proteins for which the structure was determined in two or more crystal environments. The deviations show that crystal packing forces – and, to a lesser extent, solvent and temperature – can modify slightly the conformation of the proteins.) Figure 2b shows the changes in the fraction of residues in the core as a function of sequence divergence. The fraction of residues in the cores of distantly related proteins can vary widely: in some cases the fraction of residues in the core remains high, in others it can drop to below 50% of the structure. Figure 2. How far can evolution go? Although considerable evidence supports the idea that the basic common core fold of a protein remains intact during evolution, this statement depends on the horizon within which we can confidently anticipate and identify homologues. The truncated globins are one cautionary example, as is the difficulty of deciding definitively whether globins and phycocyanins are homologues or examples of convergent evolution (Pastore & Lesk, 1990). N.V. Grishin (2001) and, subsequently, others, have suggested that the situation may be even worse. The C-terminal domain of catabolite gene activator protein (CAP) is an all-helical protein. The HIN recombinase DNA-binding protein contains a sheet. Looking only at these two structures one would unhesitatingly conclude that their folding patterns were markedly different, and that they were not homologous. However, Grishin adduced potential intermediates: the N-terminal domain of biotin repressor and the C-terminal 10 domain of ribosomal protein L11 (see Figure 3.) Successive pairs in this series – CAP C-terminal domain, biotin repressor N-terminal domain, L11 C-terminal domain, and HIN recombinase – show not entirely dissimilar folding patterns. It is not possible easily to rule out that the possibility of an evolutionary pathway between CAP and HIN recombinase (even if biotin repressor and L11 do not themselves lie on it); nor, conversely, to conclude that these four structures are homologues. Figure 3. Keyword protein structure, amino-acid sequence, alignment, evolution, structural core Summary tend to retain common folding patterns. However, although the general folding pattern is preserved, there are distortions which increase as the amino acid sequences progressively diverge. These distortions are not uniformly distributed throughout the structure. A large central core of the structure usually retains the same qualititative fold, and other parts of the structure change conformation more radically. As homologous proteins diverge, these changes in structure give rise to changes in function, including regulation. Cross-References → Macromolecular crystallography → Structural impact of SNPs → Structure comparison methods → Structural genomics → The evolution of protein structures 11 FIGURE CAPTIONS Figure 1. (a) Sequence alignment and structural superposition of papaya papain [wwPDB entry: 9PAP] and kiwi fruit actinidin [2ACT]. (b) Sequence alignment and structural superposition of papaya papain [9PAP] and human procathepsin L [1CJL]. (c) Sequence alignment and structural superposition of papaya papain [9PAP] and human liver cathepsin B [1HUC]. Note, in both the sequence alignment and the superposition, the higher similarity at the beginning and end of the sequences than in the middle region. (d) Sequence alignment and structural superposition of papaya papain [9PAP] and S. aureus staphopain [1CV8]. (e) Papaya papain, to provide a key to navigate between the sequence alignments and structure superpositions. Figure 2. Relationships between divergence of amino acid sequence and threedimensional structure of the core, in evolving proteins. (a) Variation of r.m.s. deviation of the core with the percent identical residues in the core. (b) Variation of size of the core with the percent identical residues in the core. This figure shows results calculated for 32 pairs of homologous proteins of a variety of structural types. (Adapted from Chothia and Lesk (1986).) Figure 3. Four proteins with different assemblies of secondary structure elements. Could there be an evolutionary pathway between such divergent structures? (a) C-terminal domain of catabolite gene activator protein (CAP) [1CGP]. (b) HIN recombinase DNA-binding protein [1HCR]. (c) N-terminal domain of biotin repressor [1BIA]. (d) C-terminal domain of ribosomal protein L11 [1FOW]. References [max. 20] Anfinsen, CB. (1959). The molecular basis of evolution. Wiley: New York Braunitzer G, Gehring-Mueller R, Hilschman N, Hilse K, Hobom G, Rudolff V, and Wittmann-Liebold B (1961) Die Konstitution des normalen adulten Humanhämoglobins Z Physiol Chem, 325: 283-286. 12 Browne, WJ, North, AC, Phillips, DC, Brew, K, Vanaman, TC, and Hill, RL (1969). A possible three-dimensional structure of bovine lactalbumin based on that of hen's egg-white lysozyme. J Mol Biol 42: 65-86. Chothia C,, Lesk A. (1986). Relationship between the divergence of sequence and structure in proteins The EMBO Journal 5: 823-826. Dayhoff MO, Schwartz R and Orcutt BC (1978). A Model of Evolutionary Change in Proteins. In: Atlas of protein sequence and structure (volume 5, supplement 3 ed.). Nat. Biomed. Res. Found. pp. 345-358. Dobzhansky, T. (1973). Nothing in biology makes sense except in the light of evolution. Amer. Biol. Teacher 35: 125-129. Grishin, NV (2001). Fold change in evolution of protein structures. J. Struc. Biol. 134, 167-185. Henikoff S, Henikoff, JG. (1992). Amino acid substitution matrices from protein blocks. Proc. Nat. Acad. Sci. USA 89: 10915-10919. Holm L, Sander C (1993) Protein structure comparison by alignment of distance matrices, J Mol Biol 233:123-38. Ingram VM (1956) A specific chemical difference between the globins of normal human and sickle-cell anaemia haemoglobin. Nature 178: 792-794. Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC (1958). A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181: 662-636. Kendrew, JC, Watson, H.C. (1966). Stabilizing interactions in globular proteins. In: Ciba Foundation Symposium - Principles of Biomolecular Organization. Eds: Wolstenholme GEW, O'Connor M. Ciba Foundation, London. Lesk, AM, Chothia, C. (1980). How different amino acid sequences determine similar protein structures: The structure and evolutionary dynamics of the globins J Mol Biol 136: 225-270. Needleman SB and Wunsch, CD. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 443-453. Pastore A, Lesk AM. (1990). Comparison of the structures of globins and phycocyanins: evidence for evolutionary relationship Proteins: Structure, Function and Genetics 8: 133-155. Pauling L, Itano HA, Singer SJ, Wells IC (1949). Sickle cell anemia, a molecular disease. Science 110: 543-548. 13 Perutz MF, Rossman MG, Cullis AF, Muirhead H, Will G, North ACT (1960). Structure of haemoglobin: a three-dimensional Fourier synthesis at 55Å resolution, obtained by X-ray analysis Nature, 185: 416-422. Perutz MF, Kendrew JC, Watson HC (1965). Structure and function of haemoglobin: II. Some relations between polypeptide chain configuration and amino acid sequence. J Mol Biol 13: 669-678. Sanger F, Tuppy H (1951a) The amino-acid sequence in the phenylalanyl chain of insulin. 1. The identification of lower peptides from partial hydrolysates Biochem J 49: 463-481. Sanger F, Tuppy H (1951a) The amino-acid sequence in the phenylalanyl chain of insulin. 2. The investigation of peptides from enzymic hydrolysates, Biochem J 49: 481-490. 14