Amino Acids Amino acids are the building blocks of proteins. The sequence of amino acids in individual proteins is encoded in the DNA of the cell. The physical and chemical properties of the 20 different, naturally occurring amino acids dictate the shape of the protein and its interactions with its environment. Certain short sequences of amino acids in the protein also dictate where the protein resides in the cell. Proteins are composed of hundreds to thousands of amino acids. As you can imagine, protein folding is a complicated process and there are many potential shapes due to the large number of combinations of amino acids. By understanding the properties of the amino acids you will get an appreciation for the limits of protein folding and how to predict the potential higher order structure of the protein. All amino acids have the same backbone structure with an amino group (the α-amino group), a carboxyl group, an α-hydrogen, and a variety of functional groups (R) all attached to the α-carbon. The atoms that are common to all amino acids are called the mainchain or backbone atoms because they will form the mainchain of the protein polymer. The general structure of an α-amino acid is shown on the left. The amino group (blue) has a pKa value of ~9, thus it is protonated at pH 7.0. The carboxylic acid group (red) has a pKa of 2.0, and thus it is deprotonated at pH 7.0. The amino group and the carboxylic acid are joined by the α-carbon. The α-carbon, α-proton, the amino group and the carboxyl-group are found in all amino acids. The R group varies from amino acid to amino acid. The right structure shows the amino acid leucine. The R-group is 2-methyl-propane. The name of the carbons on the sidechain follow the greek alphabet, i.e. the β-carbon is next to the α-carbon, the γ-carbon next to the β-carbon, etc. The R groups, which differ from one amino acid to the next, will form the sidechain groups, because those atoms will project out to the side of the linear protein polymer. Chirality Because there are four different groups attached to the central carbon, the alpha carbon is an asymmetric or chiral center. This means that it is not possible to superimpose a compound on its mirror image (enantiomers). In order to determine the absolute configuration of a chiral center follow these steps: 1. Label the four groups attached to the chiral center with numbers 1-4. One(1) has the highest atomic number and four(4) has the lowest. If the atoms attached to the chiral center are the same, apply this rule to the next atom, e.g. a C-C-H would have lower priority than C-C-OH. 2. Point the group labeled 4 away from you. 3. If 1,2,3 is counter-clockwise the compound is S. S is latin for "sinister", or left. Imagine pointing the thumb of your left hand in the same direction as group 4, in doing so your fingers curl in the counter-clockwise direction. 4. If 1,2,3 is clockwise the compound is R. R is latin for "rectus", or right. Imagine pointing the thumb of your right hand in the same direction as group 4, in doing so your fingers curl in the clockwise direction. Although chiral compounds have the same physical properties, they generally have quite different biological properties. In the case of amino acids, and other biological compounds, the chirality of a carbon is also indicated by another labeling scheme, D and L. This scheme is based on the chirality of a reference compound, D- or L-glyceraldehyde. Bioselectivity gives rise to the dominance of one form of chirality for amino acids in nature; all common amino acids have the same chirality as L-glyceraldehyde. Although most amino acids are S, cysteine is an exception, its absolute chirality is R when the above rules are applied. Sidechain properties: If all of the amino acids have the same basic structure with an amino, a carboxyl and a hydrogen fixed to the alpha carbon, then the large variation in the properties and structure of the amino acids must come from the fourth group attached to the alpha carbon. This group is referred to as the sidechain of the amino acid or the R group. These structures of the 20 common amino acids are shown below. Common Amino Acids Instructions: You should become familiar with the functional groups associated with the sidechain atoms of each amino acid. You should be able to infer the properties of the side chain from the 2D chemical diagram and the 3D structure. For example, which amino acids have polar sidechains? Which have planar aromatic groups? You can review the basic functional groups that were discussed in the first lecture by opening this activity. The structure of the 20 common amino acids is shown in the table. Clicking on any of the 2D drawings of an amino acid will present the 3D structure in the Jmol window on the right (Note: you may need to click twice the first time you use this tool). Initially a 2D drawing of the simplest amino acid, glycine, is shown in the upper left and its 3D structure is shown on the right. The mainchain atoms of glycine are highlighted in yellow and its sidechain (H) is highlighted in green. All amino acids have the same mainchain atoms, but differ in the sidechains. For clarity, the α–hydrogen is omitted in the remaining drawings. Non-polar amino acids are highlighted in grey, Aromatic amino acids are highlighted in cyan, Polar amino acids are highlighted in purple, Amino acids with acidic sidechains are highlighted in red, and Amino acids with basic sidechains are highlighted in blue. The amino acids cysteine and proline, which are shown at the bottom of the page have unique properties: Cysteine can form covalent S-S disulfide bond, stabilizing the protein structure by crosslinking Proline - the sidechain attaches to its own nitrogen, giving a secondary amine. Acid-Base Properties of Amino Acids and Their Side-Chains The ionization properties of amino acids depend on the mainchain amino and carboxyl group for all amino acids. Thus every amino acid has at least two ionizable groups. The sidechains of a number of amino acids have pKa values in the range of 2-12 and thus can potentially ionize in biochemical systems. The structures of these sidechains, along with their pKa values are shown in the figure below. At neutral pH (7.0) the mainchain amino group of an amino acid is positively charged and the mainchain carboxyl group is negatively charged. For those amino acids with uncharged sidechains, the positive charge on the mainchain amino group will cancel the negative charge on the carboxyl group, giving a net charge of zero, such a molecule is called a zwitterion. The overall protonation state of an amino acid depends on the pH of the solution and the pKa values of its ionizable groups. The overall charge at any pH can be calculated using the formula: qTotal=∑i=1,nfHAqHA+fA−qA− An example of calculating the charge on the amino acid glycine is given in the previous lecture. The isoelectric pH (pI) is the pH where a molecule (amino acid, protein, etc.) has no net charge. Using the above formula, it is possible to show that the pI for an amino acid that does not have an ionizable sidechain is the average of the pKa for the amino and carboxyl groups: pI = (1/2)(pKaNH2+pKaCOOH). This simple formula does not apply to amino acids with ionizable side chains, in which case the general formula for calculating the charge (shown above) has to be used to find the pH where the average charge on the molecule is zero. Ionizable Amino acids The protonated (left) and deprotonated (right) forms of ionizable amino acids are shown. Note that the acidic residues become negatively charged when ionized, while the basic residues become neutral. Clicking on the image will enlarge it. UV Absorption of Amino Acids Tryptophan (Trp), tyrosine (Tyr), and phenylalanine (Phe) contain conjugated aromatic rings. Consequently, they absorb light in the ultraviolet range (UV). Light absorbance is quantified by the absorbance, which is equal to log(Io/I), where Io is the intensity of the incident light and I is the intensity of the light that leaves the sample. If no light is absorbed, then I=Io and A=0. As the intensity of the transmitted light decreases, A increases. The amount of light that is absorbed by a solution of chromophores is characterized by the molar extinction coefficient. This is the absorbance that would be measured for a 1 molar solution. The extinction coefficients for Trp, Tyr, and Phe are listed below. Amino Acid Extinction Coefficient ε(λMAX) Trp 5,050 M-1cm-1 (280 nm) Tyr 1,440 M-1cm-1 (274 nm) Phe 220 M-1cm-1 (257 nm) Due to the dominance of the absorption by Trp residues, most proteins show a maximum light absorbance at a wavelength of 280 nm. The amount of light absorbed by a solution of concentration [X] is given by the BeerLambert Law: A=εl [X], where [X] is the concentration of the absorbing species, in moles/L, and l is the pathlength of the light (usually 1 cm). Given a known extinction coefficient it is possible to measure the concentration of a protein. Note that experimentally accurate absorption measurements seldom exceed ~3.0; at this value most of the light is absorbed by the sample and very little light reaches the detector in the instrument. Calculation of molar extinction coefficients: If a protein contains a mixture of N different chromophores, the absorbance can generally be assumed to be additive. Consequently the molar extinction coefficient for the entire protein is: ε=∑Absorbing Groupsεini=εTrpnTrp+εTyrnTyr+εPhenPhe Peptide Bond Proteins are polymers of amino acids. The amino acids are joined together by a condensation reaction similar to that described for the formation of the glycosidic bond in polysaccharides. Each amino acid in the polymer is referred to as a residue. Individual amino acids are joined together by the attack of the nitrogen of an amino group of one amino acid on the carbonyl carbon of the carboxyl group of another to create a covalent peptide bond and yield a molecule of water as shown below Peptide bond formation occurs by a dehydration reaction. The amino group of the second amino acid attacks the carbonyl carbon of the first, forming the peptide bond and releasing water. The resultant dipeptide has an amino terminus (left) and a carboxy-terminus (right). The mainchain atoms, which are the same for each residue in the peptide, include the nitrogen and its proton, the α -carbon and its hydrogen, and the C=O group. The R-groups form the sidechain atoms. The resulting peptide chain is linear, defined by the mechanism that builds the polymer, and has defined ends. Short polymers (< 50 residues or amino acids) are usually referred to as peptides, and longer polymers as proteins. Because the synthesis takes place from the alpha amino group of one amino acid to the carboxyl group of another amino acid, the result is that there will always be a free amino group on one end of the growing polymer (the N-terminus) and a free carboxyl group on the other end (the C-terminus). Note that the potential exists for the formation of amide (peptide) links involving the carboxyl and amino groups in the side chains, but bioselectivity directs the synthesis to be linear, involving only the alpha amino and alpha carboxyl groups. Note that after the amino acid has been incorporated into the protein, the charges on the amino and carboxy terminus have disappeared. Thus the mainchain atoms have become polar functional groups. Since each residue in a protein has exactly the same mainchain atoms, the functional properties of a protein must arise from the different sidechain groups. By convention, the sequences of peptides and proteins are written with the N-terminus on the left and the C-terminus on the right. The name of the N-terminal residue is always the first amino acid. The name of each amino acid then follows. The primary sequence of a protein refers to its amino acid sequence. OVERVIEW OF PROTEIN STRUCTURE A protein is composed of amino acids attached in a linear order. This basic level of protein structure is called it's primary structure and derives from the formation of peptide bonds between the individual amino acids. Each amino acid in the linear polymer is referred to as a residue. The order, or sequence of the amino acids is determined by information encoded in the cell's genes. An example of a protein sequence is shown below where the one letter abbreviations are used for each of the 20 amino acids used in cellular protein synthesis. Amino acid sequence of Human Estrogen Receptor Amino acids are indicated using the single letter code. Higher order structure is determined by the Primary Structure Proteins do not exist as linear threads in the cells but rather as spontaneously folded higher order structures. The higher order structure is determined by the amino acids in the primary structure. Usually the sequence alone is sufficient to generate higher order structures, but some proteins require chaparones to help them fold. The stages or levels of protein structure are: Primary Structure: The amino acid sequence of the protein, with no regard for the conformation of the amino acids. Secondary Structure: interactions involving only mainchain (also known as backbone) atoms resulting in α-helices and β-sheets. Mainchain atoms are the N-Cα-C=O atoms that form the backbone of the protein polymer. Tertiary Structure: long range interactions resulting in the 3-D Folding of a single polypeptide chain. Quaternary Structure: The interaction of two or more peptide chains to make a functional protein. a homodimer contains two identical chains, represented as α2 a homotrimer contains three identical chains, represented as α3 a heterodimer contains two different chains, represented as αβ a heterotrimer can contain two (e.g. α2β) identical chains, or three different chains, as in α,β,γ a heterotetramer often contains two pairs of identical chains, such as in α2β2, but can contain four different chains, e.g. αβγδ Example - Structure Hierarchy in Hemoglobin The oxygen transport protein, hemoglobin, is shown in this Jmol. The heme groups, which are colored purple, are responsible for binding the oxygen. The protein component of hemoglobin is colored gray. Hemoglobin looks complicated, but we can understand its structure using a hierarchical description of the structure. Hemoglobin Primary Structure is the sequence of amino acids. Hemoglobin has four separate polypeptide chains. Secondary Structure describes the local structure of just the main chain atoms. Each subunit of hemoglobin contains a number of alpha-helical secondary structural elements. Tertiary Structure - is the complete description of the structure of both the mainchain and sidechain atoms of one poly-peptide chain. Clicking on the button will show you the tertiary structure of one of the sub-units of hemoglobin. Of course, the tertiary structure is built-up from secondary structural elements. Quaternary Structure is the complete description of the structure of all of the different poly-peptide chains that comprise the functional molecule. Clicking on the button will show you the complete quaternary structure of hemoglobin. the quaternary structure is also built-up from secondary structural elements. Determining Primary Structure We will focus on N-terminal sequencing of the actual protein using Edman degradation. Fragmentation of the peptide may be required in the case of larger proteins. Note that protein sequences can be also be inferred from the DNA sequence and experimentally using mass spectroscopy. Edman Degradation: The detailed chemical mechanism of Edman degradation will not be discussed here, however an overview of the Edman chemistry is shown here: Sequencing long Proteins: It is generally not possible to sequence an entire protein from the amino terminus. To extend the sequence information the protein is fragmented into smaller peptides. After cleavage, the individual peptide fragments are separated from each other and each is independently subject to N-terminal sequencing using the Edman degradation method. Three common fragmentation reactions are: Cyanogen bromide (CNBr) cleaves the peptide bond after Methionine residues. As an example: Ser−Met−Gly−Ala−Phe−Arg−Leu−IleCNBr−− − →Ser−Met + Gly−Ala−Phe−Arg−Leu−Ile Chymotrypsin hydrolyzes the peptide bonds that follow large hydrophobic residues, e.g. Phenylalanine, Tyrosine, Tryptophan. As an example: Ser−Met−Gly−Ala−Phe−Arg−Leu−IleChymotrypsin−−−−−−−−→Ser−Met−Gly−Ala−Phe + Arg−Leu−Ile Trypsin hydrolyzes the peptide bonds that follow positively charged residues, e.g. Lysine and Arginine. As an example: Ser−Met−Gly−Ala−Phe−Arg−Leu−IleTrypsin−−−−→Ser−Met−Gly−Ala−Phe−Arg + Leu −Ile If only two fragments are produced by the cleavage reaction, then it is straightforward to reconstruct the sequence using the known sequence of the original protein. However if the original protein is cleaved into three or more fragments, then it is not possible to determine the correct order of fragments using a single cleavage agent. Multiple overlapping fragments have to be used to determine the correct ordering, as illustrated below. Sequence Determination Ala-Gly-Met-Ser-Thr-Gly-Val-Val-Lys-Gly-Ser-Ala-Phe-Leu In this example I have assumed that 6 cycles of Edman degradation are possible. After that, impurities and side reactions prevent the reliable identification of the amino acid. Note that in practice 30-100 cycles can be accomplished, giving the sequence of the first 30-100 residues of the protein. A: the first six cycles of edman degradation produced, Ala, Gly, Met, Ser, Thr, and Gly, in that order. therefore the amino terminal sequence is: Ala-Gly-Met-Ser-Thr-Gly B: A new sample of the peptide was treated with CNBr. The two peptides (CNBr-1, CNBr2) that were produced were isolated and each was subject to Edman Degradation, giving the following sequences (The residues in bold were determined by Edman degradation, the remainder of the peptide is present, but not detectable). CNBr-1: Ala-Gly-Met CNBr-2: Ser-Thr-Gly-Val-Val-Lys-Gly-Ser-Ala-Phe-Leu C: A new sample of the peptide was treated with Trypsin. The two peptides (Trp1, Trp2) that were produced were isolated and each was subject to Edman Degradation. The sequence of these two peptides was: Trp1: Gly-Ser-Ala-Phe-Leu Trp2: Ala-Gly-Met-Ser-Thr-Gly-Val-Val-Lys Strategy: Find overlaps between fragments obtained with different cleavage reagents and use these overlaps to correctly pair the peptides obtained from one sequencing reaction. The overlaps can be readily identified by finding a cleavage site in a peptide that would be cut by another cleavage reagent (e.g. Trypsin) and then identifying the correct fragment based on the expected amino-terminal sequence. For example, the sequence from the Edman degradation of the intact peptide contains a Met residue, so you would look for overlaps between the intact sequence and the two CNBr fragments: Ala−Gly−Met−Ser−Thr−GlyAla−Gly−Met Ser−Thr−Gly−Val−Val−Lys CNBr−1 CNBr−2Combine to give:Ala−Gly−Met−Ser−Thr−Gly−Val−Val−Lys The partial sequence above contains a Lys residue. Therefore one of the Trypsin fragments should start with a Gly residue. Of the two Trypsin fragments, Trp1 starts with a Gly residue. Therefore Trp1 must be the second fragment, allowing completion of the sequence: Ala-Gly-Met-Ser-Thr-Gly-Val-Val-Lys Gly-Ser-Ala-Phe-Leu Before discussing secondary structure, it is important to appreciate the conformational plasticity of proteins. Each residue in a polypeptide has three bonds connecting mainchain atoms that are potentially free to rotate. The conformation of the atoms involved in these bonds describes the secondary structure of the protein. The rotation angle about a bond is referred to as a torsional angle. A torsional angle defines the relative orientation of four atoms in space and it is the angle between two planes. The torsional angle between the N and Cα bond is shown below. Conformation of the Ci-1 - N (Peptide bond). The four atoms that make up this bond are planar due to the hybridization properties of the carbonyl carbon and the nitrogen (both sp2). In addition, free rotation about the bond is not possible since the pz orbitals of oxygen, carbon, and nitrogen form a delocalized system. Rotation about the peptide bond would break the interaction between the pz orbital of the nitrogen and carbon atoms, and is therefore unfavorable. The peptide bond is said to be a "partial double bond". The atomic orbitals of the mainchain atoms are shown. The carbonyl carbon uses an sp2 hybrid orbital to bond to the carbonyl oxygen and the nitrogen. Consequently, the oxygen, carbon, and nitrogen all lie in the same plane. Since the nitrogen is also uses sp2 hybrid orbitals, the amide hydrogen is also on the same plane. The second bond between the carbonyl carbon and oxygen is formed by overlap of the p z orbitals. The nitrogen pz orbital is also in a favorable position to overlap with the carbon pz orbital. Consequently, the electrons in all three pz orbitals form a delocalized system resulting in a partial double bond between the carbonyl carbon and the nitrogen. Cis and Trans Peptide Bonds: Two possible orientations of the peptide bond that maintain a favorable pzinteraction between the carbon and nitrogen are possible. They are related by a 180o flip of the peptide bond, generating the trans form and the cis form. For all peptide bonds, the trans form is more stable than the cis form. The higher energy of the cis form is due, in part, to overlap between the α-protons on adjacent residues (nonproline) or between α and δ protons in the case of proline. Use the buttons on the Jmol images below to highlight these overlaps. Ala-Ala Trans and Cis Peptide Bonds. In the case of linkages between non-proline residues, the unfavourable overlap of the mainchain (and sidechain) atoms makes the cis form less stable by about 15 kJ/mol, giving a ratio of trans to cis of 1000:1. The molecular crowding of the α-hydrogens in the cis form is evident in the Jmol image on the far right. Ala-Pro Trans and Cis Peptide Bonds. Both the trans and cis form of the peptide bond result in overlap of atoms, raising the energy of the trans such that it is only ~4 kJ/mol lower than the cis form. In the trans form the molecular crowding involves the α-hydrogen of the preceding residue and the δ-hydrogens on the proline. In the cis form it is the two α-hydrogens. N - Cα & Cα - C Bonds: The torsional angles associated with each of these bonds are defined as: Φ (Phi), the bond between N and Cα Ψ (Psi), the bond between Cα and C. There is free rotation about both of these bonds. Not all torsional angles are equally likely. There are three torsional angles that are more stable than others. They are related to each other by a 120o rotation about the bond. The three stable positions are more easily seen with a simpler molecule, such as 1-chloro-1-fluoroethane. Note that the three stable conformations minimize the interaction between the atoms on each carbon by maximizing their distance from each other. These conformations are also stabilized by favorable interactions between the molecular orbitals in the molecule. In the case of amino acid residues in a protein, the presence of the bulky atoms on the sidechain restricts the possible phi and psi angles of a residue to 3 pairs of values that are relatively low in energy: Three possible phi psi angles of a residue Φ=-60o, and Ψ=-45o Φ=+60o, and Ψ=+45o Φ=-120o, and Ψ=125o Secondary Structures Proteins consist of a linear chain of amino acids, with each amino acid representing a build block. The shape of each block depends on the Φ and Ψ angle of each residue. In regular secondary structures these angles are the same for each residue, and thus the shape of each building block, or amino acid, is the same. If a series of identically shaped objects are laid end-to-end they will form some type of geometrical structure. In two dimensions there are two possibilities, either a straight chain or some type of circle (which may or may not be closed). A straight chain occurs if the is no curvature in the block, while a circle will result if there is any degree of curvature. The radius of the circle is related to the degree of curvature. In the case of three dimensional building blocks that have some degree of curvature on both faces, the two dimensional circular structure becomes a helix. If the building block is a perfect rectangular prism, then the structure will remain linear. Given the possible values of Φ and Ψ angles, many different shapes of the amino acid building block are possible and therefore many different three dimensional structures are possible. Only two are commonly observed in proteins, the right-handed alpha helix and beta-structures. A left-handed alpha helix is also stable, but relatively rare . These conformations are stable because they: Maximize mainchain hydrogen bonding Maximize van der Waals interactions of mainchain atoms. Minimizing steric clashes of mainchain and sidechain atoms In all cases these secondary structures of proteins have characteristic values of the Ψ and Φ torsional angles that are the same for each residue within a particular secondary structure. In all cases each peptide bond is rigid and planar in the trans conformation. Alpha helix (Φ=-60o, and Ψ=-45o) 1. Dimensions, geometry, & H-bonds: 3.6 residues/turn pitch = 5.4 A/turn rise/residue = 1.5 A 2. H-bonds || to helix axis. 3. Sidechains point outwards 4. Right handed form is more stable Beta Structures (Φ=-120o, and Ψ=125o These particular phi and psi torsional angles generate a building block that is almost a perfect rectangular prism. Consequently, beta structures consist of straight, fully extended, strands of linked amino acids which are called beta-strands.. Due to the extended nature of the polypeptide it is not possible to form mainchain hydrogen bonds between adjacent residues. Consequently H-bonds are between adjacent strands and perpendicular to the direction of the strand. Two or more beta-strands make a beta-sheet . The amino acid sidechains are also directed outward from the strand and alternate between pointing upwards and downwards with respect to the plane of the sheet. In a beta-sheet there are two possibilities for the relative orientation of the individual strands. The strands can run in the same direction, generating a parallel beta-sheet, or they can run in opposite directions, generating an anti-parallel beta sheet. In both types of sheet, the beta-strands are parallel to each other. Ramachandran Plots The phi and psi angles for each residue in a protein are neatly summarized in a Ramachandran plot. The horizontal and vertical axis represent the phi and psi angles of a peptide residue. A single point in the plot represents the phi and psi values of one residue. The contour lines surround regions of low energy and correspond to β-strand, αR helix, or αL helix secondary structures. High energy regions result from unfavorable van der Waals interactions between sidechain atoms. Non-regular secondary structures Sharp turns in proteins, particularly at the ends of beta-strands and beta-hairpins, have a characteristic geometry and sequence. As with other forms of secondary structure, these turns are stabilized by hydrogen bonding and favorable van der waals interactions. These turns often contain Glycine at position 3 (R3), because of its unique conformational properties. Use the Jmol activity to the right to determine the hydrogen bonding pattern in a type II turn.