1 Lecture 2: 4. SECONDARY STRUCTURE 4.1 Polypeptide main chain conformations Many of the important features of protein three-dimensional structure arise from the regular repeating nature of a polypeptide chain. First, the amino acids are all of the same enantiomer; they are L-amino acids. If the amino acid is viewed along the hydrogen to C bond - with the hydrogen towards the observer, the groups read clockwise CO-R-N. Figure 4.1.1. Trans and cis peptide groups for alanine and proline. R R O O C C C O L-Amino acid C N N O D-Amino acid Second, the peptide bond is planar, due to the delocalisation of the electrons over the N-C-O of the peptide. It is almost always trans with respect to the polypeptide backbone (see Figure 4.1.1); the trans arrangement is normally favoured over cis by 103-fold. However, the cis arrangement is more favourable for Xaa-Pro peptide bonds (Figure 4.1.2), where Xaa is any amino acid and Pro is proline, the only amino acid with a carbon of the sidechain bound to the N instead of a hydrogen atom. In that case, the trans: cis ratio is only 4:1. The energy barrier to rotation between cis and trans is 15-20 kcal/mol (63-84 kJ/mol). 2 In polypeptide chains the conformation of the backbone at each planar peptide unit is specified by two torsion angles: (psi) refers to rotation about the C-C single bond; (phi) refers to rotation about the C-N bond. These angles are both close to 180 for the fully extended polypeptide chain as shown in Figure 4.1.2. C C O O C N C C N C R N = 180 = 180 R N C O C = 0 = 0 C C Figure 4.1.2. Definition of torsion angles between peptide planes. Certain combinations of (psi) and (phi) are not allowed, because of steric hindrance of the peptide group with the side chain. For glycine, where there is no sidechain, more conformations are available. This can be visualised using a Ramachandran plot or conformational map, which shows all possible combinations of and and divides them into allowed regions, representing conformations where no steric hindrance exists, and disallowed regions where steric hindrance occurs. Figure 4.1.3 shows the asymmetric plot for amino The acids with a sidechain. Steric Ramachandran p hindrance occurs as a result plot of the sidechain and the most favourable conformations Those have torsion angles with conformations negative (phi) values. An where there are no steric equivalent plot for glycine is clashes symmetrical around the i.e. they are centre, as there is no side allowed chain. L R Figure 4.1.3. Ramachandran plot for residues with sidechains. 3 Ramachandran plots for real proteins show a variety of / values, mainly with negatice (phi) values as expected from the presence of L-amino acids. A few asparagines, aspartic acids and some other amino acids, have positive (phi) values, in addition to many of the glycines. This is because the sidechains of these amino acids can interact with the backbones atoms to stabilise conformations involving positive (phi) under some circumstances. 4.2 The major secondary structures In the Ramachandran plot in Figure 4.1.3 the two allowed regions on the left side (with negative values) labelled and /p correspond to the conformations of the amino acid residues in the common secondary structures. R corresponds to the helix (also known as the 3.613 helix), This has 3.6 residues per turn. It is a righthanded helix, with the carbonyls pointing towards the C-terminus of the helix, and the sidechains and NH groups towards the N-terminus (see figure 4.2.1). A left handed helix, with all residues having the conformations L in the Ramachandran plot in Figure 4.1.3, would have the sidechains clashing with the carbonyls and is therefore rarely found, except in short regions with glycines. The -helix is further stabilised by hydrogen bonds between the CO of residue i and the NH of residue i+4, thereby closing a ring of 13 atoms: H-N-C-C-N-C-C-N-C-C-N -C-O.. This is the 13 of the 3.613. Such helices are found widely in globular proteins and in fibrous proteins such as keratin. Tighter helices with three residues per turn (310-helix) or looser ones with 4.4 residues per turn are also possible, but have either less linear H-bonds or a hole in the middle and are consequently less stable. Only short helices of these kinds are found in globular proteins, and then often at the termini of the more stable helices. A large number of amino acid residues fall into the R region in haemoglobin. These residues are mainly be in the -helices which make up most of the structure, but they are also found in irregular regions that allow the chain to turn between the secondary structures. Some of the helices in haemoglobin tighten into 310-helices or loosen into helices with 4.4 residues per turn as the chain moves into these irregular loop regions. All peptide CO groups point to the C-terminus of the -helix, while the NH groups point towards the N-terminus. Thus, all peptide dipoles are aligned, giving rise to a helix dipole. This probably explains why the negatively charged residues, such as aspartate and glutamate, tend to ‘cap’ the helix at the N-terminus and positively charged residues, such as arginine, lysine and histidine, do so at the C-terminus. Also helices in enzyme active sites that bind highly negatively charged groups, like a phosphate for example, tend to have helix dipoles with positive poles close to the phosphate. Figure 4.2.1 The right-handed -helix found in proteins. The C, CO and NH atoms of backbone and the C atom of the sidechain are shown 4 The p and the allowed regions of the Ramachandran plot correspond to much more extended chains, in which intra-chain H-bond formation is not possible. Instead the NH and carbonyl groups of the backbone are involved in inter-chain H-bonds. The conformation leads to a slightly twisted, extended strand, actually a helix with slightly more than two (2.2) residues per turn. This can contribute to two kinds of slightly twisted sheet, one in which the strands run antiparallel and the other in which they run parallel (Figure 4.2.2); in both the side-chains project alternately above and below the plane of the sheet. The antiparallel -sheet is found in fibres such as silk and probably in amyloid, a structure that most proteins seem to form under extreme acid conditions but which some proteins form easily, giving rise to amyloid fibrils of Alzheimer’s, prion diseases and amyloidosis, a genetic disease usually caused by mutations in normal proteins. Both parallel and antiparallel sheets have the same twist. They would only be flat if the values were both 180o. In fact , values are usually in the region -120o, 140o for both kinds of sheets. This twist leads easily to the formation of barrels and open pores. Both parallel and antiparallel -sheets are found in globular proteins. In all -proteins like the FGF or FGFR antiparallel -sheets are found and in the absence of helices seem to be more stable. Where there are alternating structures, parallel sheets can be formed as the chain can travel back to the other end of the sheet through an -helix; for these the sequences are usually hydrophobic and stabilised by packing of the helices on either side. Such parallel -strands with associated helices are found both in the N-domain of the FGFR kinase and in the HIV-proteinase, although in neither case are they pure parallel -sheets as found in the Rossman fold and barrels. Inspection of the relative frequency of occurrence of amino acid residues in these regular secondary structures (Figure 4.2.3) shows that the preferences are not absolute. Leucines, lysines and glutamates tend to favour -helices, while valines and isoleucines favour -strands. Proline tends to disfavour both as it can not form hydrogen bonds through an NH function, but nevertheless it is accommodated with small distortions particularly in long secondary structures and of course at the Ntermini of helices. Studies on synthetic peptides have shown some of the reasons for the preferences for certain secondary structures: thus, poly(valine) cannot easily adopt an -helical structure because of branching at the carbon of the side chain. Poly-(glutamic acid) is helical at pH 3, but not at pH 7, where the side chains are ionised. Subtler effects have also been revealed by such experiments. Thus, at pH 7.5, N-[Glu]20-[Ala]20-C has a high helical content for the alanine region, whereas N-[Ala]20-[Glu]20-C has very little -helical character. The explanation again probably lies with the -helix dipole; the negatively charged glutamate residues at the N-terminus should stabilise the helix. 5 Figure 4.2.2. -sheet with antiparallel and parallel ß-strands. The p conformation, close to that of the -sheet but with of ~-60, leads to an extended helix with about three residues per turn. If every third residue is a glycine, three strands can hydrogen bond together to give a triple helix which is found in collagen. It also occurs C1q component of complement (Stryer III, p. 899). But this conformation is also quite common in single strands in globular proteins, especially in regions where prolines occur, presumably because the cyclic sidechain of proline prevents conformations that are stabilised by intra-chain or inter-chain hydrogen bonding. Figure 4.2.3. The relative frequency of occurrence of amino acid residues in regular secondary structures Amino acid Ala Cys Leu Met Glu Gln His Lys Val Ile -helix -sheet -turn 1.29 1.11 1.30 1.47 1.44 1.27 1.22 1.23 0.91 0.97 0.90 0.74 1.02 0.97 0.75 0.80 1.08 0.77 1.49 1.45 0.78 0.80 0.59 0.39 1.00 0.97 0.69 0.96 0.47 0.51 Amino acid Phe Tyr Trp Thr Gly Ser Asp Asn Pro Arg -helix -sheet -turn 1.07 0.72 0.99 0.82 0.56 0.82 1.04 0.90 0.52 0.96 1.32 1.25 1.14 1.21 0.92 0.95 0.72 0.76 0.64 0.99 0.58 1.05 0.75 1.03 1.64 1.33 1.41 1.28 1.91 0.88 Secondary structure can be predicted more accurately if we bring information about the amino acid patterns expected in the sequence. Thus, in an -helix which is packed against the core of a globular protein or against another helix in some fibrous proteins, we expect to see repeats of residues conserved as hydrophobic or hydrophilic at i, i+3, i+4, i+7, as they would all be on the same side of the helix. This can be seen in the haemoglobin sequences. In -sheet proteins the residues protrude alternately on different sides of the sheet, so for an all -protein like FGFR where the strands are on the surface of the protein, residues would be expected to repeat at i, i+2, i+4 and so 6 on. For proteins with poly-proline sequences the repeat is i, i+3, i+6 and so on. This is of course seen in collagen triple helices. 4.3 Reverse turns When torsion angles differ in three or four consecutive amino acid residues, the polypeptide chains can change direction. Some combinations of torsion angles reoccur throughout globular proteins, and probably in some fibrous proteins like amyloid, giving rise to right angle bends (half turns) or reverses in chain direction (reverse turns). For the common reverse turns, known as -turns because they often occur between two antiparallel strands in a -sheet, there is a hydrogen bond between the CO of residue i and the NH of residues i+3. Type II turns fit particularly well to the twist of the antiparallel sheet, but one side chain is usually glycine to allow a conformation with a positive torsion angle. Type III turns resemble a half turn of the 310 helix described above; in fact they have amino acid residues i+1 and i+2 at the turn with similar torsion angles to those of the 310 helix. Reverse turns are a common feature of proteins with all -structures. A good example is the immunoglobulin domains D2 and D3 of the FGFR structure, where the chain needs to reverse between each strand, but they are also found in the irregular regions between helices, for example in haemoglobin. Glycines are often found in turns, so that they can change direction sharply. Prolines are also quite common, as they prevent H-bonding and so ‘cap’ helices; this is a very common feature of the structure of haemoglobin. Reverse turns are usually found exposed to solvent, and often form H-bonds to water. Larger loop regions at surfaces are critically important in many recognition processes, e.g. at enzyme active sites and in antibody-antigen interactions (see later). 4.4 Quantitation of secondary structure in polypeptides: circular dichroism The spectral properties of polypeptide chains in different secondary structures be used to distinguish them. Because they constructed from L-amino acids, natural peptides absorb left- and right-circularly polarised light unequally. This measurable difference, termed ellipticity constituting circular dichroism (CD), is highly sensitive to the conformation of the backbone. The magnitude of the effect, quoted as the molar ellipticity , is plotted against the wavelength of the incident light. The CD spectrum of a protein will the weighted average of the contributions from the individual spectra for various secondary structure types found in the protein. Figure 4.4.1. The circular dichroism of poly-lysine in three different states: can are and be