Uploaded by Anjelo Peren

biochem proteins

advertisement
Amino Acids
Amino acids are the building blocks of proteins. The sequence of amino acids in individual
proteins is encoded in the DNA of the cell. The physical and chemical properties of the
20 different, naturally occurring amino acids dictate the shape of the protein and its
interactions with its environment. Certain short sequences of amino acids in the protein
also dictate where the protein resides in the cell. Proteins are composed of hundreds to
thousands of amino acids. As you can imagine, protein folding is a complicated process
and there are many potential shapes due to the large number of combinations of amino
acids. By understanding the properties of the amino acids you will get an appreciation for
the limits of protein folding and how to predict the potential higher order structure of the
protein.
All amino acids have the same backbone structure with an amino group (the α-amino
group), a carboxyl group, an α-hydrogen, and a variety of functional groups (R) all
attached to the α-carbon. The atoms that are common to all amino acids are called
the mainchain or backbone atoms because they will form the mainchain of the protein
polymer.
The general structure of an α-amino acid is shown on the left. The amino group (blue)
has a pKa value of ~9, thus it is protonated at pH 7.0. The carboxylic acid group (red) has
a pKa of 2.0, and thus it is deprotonated at pH 7.0. The amino group and the carboxylic
acid are joined by the α-carbon. The α-carbon, α-proton, the amino group and the
carboxyl-group are found in all amino acids. The R group varies from amino acid to amino
acid. The right structure shows the amino acid leucine. The R-group is 2-methyl-propane.
The name of the carbons on the sidechain follow the greek alphabet, i.e. the β-carbon is
next to the α-carbon, the γ-carbon next to the β-carbon, etc.
The R groups, which differ from one amino acid to the next, will form the sidechain
groups, because those atoms will project out to the side of the linear protein polymer.
Chirality
Because there are four different groups attached to the central carbon, the alpha carbon
is an asymmetric or chiral center. This means that it is not possible to superimpose a
compound on its mirror image (enantiomers).
In order to determine the absolute configuration of a chiral center follow these
steps:
1. Label the four groups attached to the chiral center with numbers 1-4. One(1) has the
highest atomic number and four(4) has the lowest. If the atoms attached to the chiral
center are the same, apply this rule to the next atom, e.g. a C-C-H would have lower
priority than C-C-OH.
2. Point the group labeled 4 away from you.
3. If 1,2,3 is counter-clockwise the compound is S. S is latin for "sinister", or left.
Imagine pointing the thumb of your left hand in the same direction as group 4, in
doing so your fingers curl in the counter-clockwise direction.
4. If 1,2,3 is clockwise the compound is R. R is latin for "rectus", or right. Imagine
pointing the thumb of your right hand in the same direction as group 4, in doing so
your fingers curl in the clockwise direction.
Although chiral compounds have the same physical properties, they generally have quite
different biological properties. In the case of amino acids, and other biological
compounds, the chirality of a carbon is also indicated by another labeling scheme, D and
L. This scheme is based on the chirality of a reference compound, D- or L-glyceraldehyde.
Bioselectivity gives rise to the dominance of one form of chirality for amino acids in nature;
all common amino acids have the same chirality as L-glyceraldehyde. Although most
amino acids are S, cysteine is an exception, its absolute chirality is R when the above
rules are applied.
Sidechain properties: If all of the amino acids have the same basic structure with an
amino, a carboxyl and a hydrogen fixed to the alpha carbon, then the large variation in
the properties and structure of the amino acids must come from the fourth group attached
to the alpha carbon. This group is referred to as the sidechain of the amino acid or the R
group. These structures of the 20 common amino acids are shown below.
Common Amino Acids
Instructions: You should become familiar with the functional groups associated with the
sidechain atoms of each amino acid. You should be able to infer the properties of the side
chain from the 2D chemical diagram and the 3D structure. For example, which amino
acids have polar sidechains? Which have planar aromatic groups? You can review the
basic functional groups that were discussed in the first lecture by opening this activity.
The structure of the 20 common amino acids is shown in the table. Clicking on any of the
2D drawings of an amino acid will present the 3D structure in the Jmol window on the
right (Note: you may need to click twice the first time you use this tool). Initially a 2D
drawing of the simplest amino acid, glycine, is shown in the upper left and its 3D structure
is shown on the right.
The mainchain atoms of glycine are highlighted in yellow and its sidechain (H) is
highlighted in green. All amino acids have the same mainchain atoms, but differ in the
sidechains. For clarity, the α–hydrogen is omitted in the remaining drawings.
Non-polar amino acids are highlighted in grey,
Aromatic amino acids are highlighted in cyan,
Polar amino acids are highlighted in purple,
Amino acids with acidic sidechains are highlighted in red, and
Amino acids with basic sidechains are highlighted in blue.
The amino acids cysteine and proline, which are shown at the bottom of the page have
unique properties:
Cysteine can form covalent S-S disulfide bond, stabilizing the protein structure by
crosslinking
Proline - the sidechain attaches to its own nitrogen, giving a secondary amine.
Acid-Base Properties of Amino Acids and Their Side-Chains
The ionization properties of amino acids depend on the mainchain amino and carboxyl
group for all amino acids. Thus every amino acid has at least two ionizable groups. The
sidechains of a number of amino acids have pKa values in the range of 2-12 and thus
can potentially ionize in biochemical systems. The structures of these sidechains, along
with their pKa values are shown in the figure below. At neutral pH (7.0) the mainchain
amino group of an amino acid is positively charged and the mainchain carboxyl group is
negatively charged. For those amino acids with uncharged sidechains, the positive
charge on the mainchain amino group will cancel the negative charge on the carboxyl
group, giving a net charge of zero, such a molecule is called a zwitterion.
The overall protonation state of an amino acid depends on the pH of the solution and the
pKa values of its ionizable groups. The overall charge at any pH can be calculated using
the formula:
qTotal=∑i=1,nfHAqHA+fA−qA−
An example of calculating the charge on the amino acid glycine is given in the previous
lecture.
The isoelectric pH (pI) is the pH where a molecule (amino acid, protein, etc.) has no net
charge. Using the above formula, it is possible to show that the pI for an amino acid that
does not have an ionizable sidechain is the average of the pKa for the amino and carboxyl
groups: pI = (1/2)(pKaNH2+pKaCOOH). This simple formula does not apply to amino
acids with ionizable side chains, in which case the general formula for calculating the
charge (shown above) has to be used to find the pH where the average charge on the
molecule is zero.
Ionizable Amino acids
The protonated (left) and deprotonated (right) forms of ionizable amino acids are shown.
Note that the acidic residues become negatively charged when ionized, while the basic
residues become neutral. Clicking on the image will enlarge it.
UV Absorption of Amino Acids
Tryptophan (Trp), tyrosine (Tyr), and phenylalanine (Phe) contain conjugated aromatic
rings. Consequently, they absorb light in the ultraviolet range (UV). Light absorbance is
quantified by the absorbance, which is equal to log(Io/I), where Io is the intensity of the
incident light and I is the intensity of the light that leaves the sample. If no light is absorbed,
then I=Io and A=0. As the intensity of the transmitted light decreases, A increases.
The amount of light that is absorbed by a solution of chromophores is characterized by
the molar extinction coefficient. This is the absorbance that would be measured for a 1
molar solution. The extinction coefficients for Trp, Tyr, and Phe are listed below.
Amino Acid
Extinction Coefficient ε(λMAX)
Trp
5,050 M-1cm-1 (280 nm)
Tyr
1,440 M-1cm-1 (274 nm)
Phe
220 M-1cm-1 (257 nm)
Due to the dominance of the absorption by Trp residues, most proteins show a maximum
light absorbance at a wavelength of 280 nm.
The amount of light absorbed by a solution of concentration [X] is given by the BeerLambert Law: A=εl [X], where [X] is the concentration of the absorbing species, in
moles/L, and l is the pathlength of the light (usually 1 cm). Given a known extinction
coefficient it is possible to measure the concentration of a protein. Note that
experimentally accurate absorption measurements seldom exceed ~3.0; at this value
most of the light is absorbed by the sample and very little light reaches the detector in the
instrument.
Calculation of molar extinction coefficients: If a protein contains a mixture of N different
chromophores, the absorbance can generally be assumed to be additive. Consequently
the molar extinction coefficient for the entire protein is:
ε=∑Absorbing Groupsεini=εTrpnTrp+εTyrnTyr+εPhenPhe
Peptide Bond
Proteins are polymers of amino acids. The amino acids are joined together by a
condensation reaction similar to that described for the formation of the glycosidic bond in
polysaccharides. Each amino acid in the polymer is referred to as a residue. Individual
amino acids are joined together by the attack of the nitrogen of an amino group of one
amino acid on the carbonyl carbon of the carboxyl group of another to create a covalent
peptide bond and yield a molecule of water as shown below
Peptide bond formation occurs by a dehydration reaction. The amino group of the second
amino acid attacks the carbonyl carbon of the first, forming the peptide bond and releasing
water. The resultant dipeptide has an amino terminus (left) and a carboxy-terminus (right).
The mainchain atoms, which are the same for each residue in the peptide, include the
nitrogen and its proton, the
α
-carbon and its hydrogen, and the C=O group. The R-groups form the sidechain atoms.
The resulting peptide chain is linear, defined by the mechanism that builds the polymer,
and has defined ends. Short polymers (< 50 residues or amino acids) are usually referred
to as peptides, and longer polymers as proteins. Because the synthesis takes place from
the alpha amino group of one amino acid to the carboxyl group of another amino acid,
the result is that there will always be a free amino group on one end of the growing
polymer (the N-terminus) and a free carboxyl group on the other end (the C-terminus).
Note that the potential exists for the formation of amide (peptide) links involving the
carboxyl and amino groups in the side chains, but bioselectivity directs the synthesis to
be linear, involving only the alpha amino and alpha carboxyl groups.
Note that after the amino acid has been incorporated into the protein, the charges on the
amino and carboxy terminus have disappeared. Thus the mainchain atoms have become
polar functional groups. Since each residue in a protein has exactly the same mainchain
atoms, the functional properties of a protein must arise from the different sidechain
groups.
By convention, the sequences of peptides and proteins are written with the N-terminus
on the left and the C-terminus on the right. The name of the N-terminal residue is always
the first amino acid. The name of each amino acid then follows. The primary sequence of
a protein refers to its amino acid sequence.
OVERVIEW OF PROTEIN STRUCTURE
A protein is composed of amino acids attached in a linear order. This basic level
of protein structure is called it's primary structure and derives from the formation of
peptide bonds between the individual amino acids. Each amino acid in the linear polymer
is referred to as a residue. The order, or sequence of the amino acids is determined by
information encoded in the cell's genes. An example of a protein sequence is shown
below where the one letter abbreviations are used for each of the 20 amino acids used in
cellular protein synthesis.
Amino acid sequence of Human Estrogen Receptor
Amino acids are indicated using the single letter code.
Higher order structure is determined by the Primary Structure
Proteins do not exist as linear threads in the cells but rather as spontaneously
folded higher order structures. The higher order structure is determined by the amino
acids in the primary structure. Usually the sequence alone is sufficient to generate higher
order structures, but some proteins require chaparones to help them fold.
The stages or levels of protein structure are:
Primary Structure: The amino acid sequence of the protein, with no regard for the
conformation of the amino acids.
Secondary Structure: interactions involving only mainchain (also known as backbone)
atoms resulting in α-helices and β-sheets. Mainchain atoms are the N-Cα-C=O atoms
that form the backbone of the protein polymer.
Tertiary Structure: long range interactions resulting in the 3-D Folding of a single
polypeptide chain.
Quaternary Structure: The interaction of two or more peptide chains to make a
functional protein.





a homodimer contains two identical chains, represented as α2
a homotrimer contains three identical chains, represented as α3
a heterodimer contains two different chains, represented as αβ
a heterotrimer can contain two (e.g. α2β) identical chains, or three different chains,
as in α,β,γ
a heterotetramer often contains two pairs of identical chains, such as in α2β2, but
can contain four different chains, e.g. αβγδ
Example - Structure Hierarchy in Hemoglobin
The oxygen transport protein, hemoglobin, is shown in this Jmol. The heme groups, which
are colored purple, are responsible for binding the oxygen. The protein component of
hemoglobin is colored gray. Hemoglobin looks complicated, but we can understand its
structure using a hierarchical description of the structure.
Hemoglobin
Primary Structure is the sequence of amino acids. Hemoglobin has four separate
polypeptide chains.
Secondary Structure describes the local structure of just the main chain atoms. Each
subunit of hemoglobin contains a number of alpha-helical secondary structural elements.
Tertiary Structure - is the complete description of the structure of both the mainchain
and sidechain atoms of one poly-peptide chain. Clicking on the button will show you the
tertiary structure of one of the sub-units of hemoglobin. Of course, the tertiary structure is
built-up from secondary structural elements.
Quaternary Structure is the complete description of the structure of all of the different
poly-peptide chains that comprise the functional molecule. Clicking on the button will
show you the complete quaternary structure of hemoglobin. the quaternary structure is
also built-up from secondary structural elements.
Determining Primary Structure
We will focus on N-terminal sequencing of the actual protein using Edman degradation.
Fragmentation of the peptide may be required in the case of larger proteins. Note that
protein sequences can be also be inferred from the DNA sequence and experimentally
using mass spectroscopy.
Edman Degradation: The detailed chemical mechanism of Edman degradation will not
be discussed here, however an overview of the Edman chemistry is shown here:
Sequencing long Proteins: It is generally not possible to sequence an entire protein
from the amino terminus. To extend the sequence information the protein is fragmented
into smaller peptides. After cleavage, the individual peptide fragments are separated from
each other and each is independently subject to N-terminal sequencing using the Edman
degradation method. Three common fragmentation reactions are:
Cyanogen bromide (CNBr) cleaves the peptide bond after Methionine residues. As an
example:
Ser−Met−Gly−Ala−Phe−Arg−Leu−IleCNBr−−
−
→Ser−Met + Gly−Ala−Phe−Arg−Leu−Ile
Chymotrypsin hydrolyzes the peptide bonds that follow large hydrophobic residues, e.g.
Phenylalanine, Tyrosine, Tryptophan. As an example:
Ser−Met−Gly−Ala−Phe−Arg−Leu−IleChymotrypsin−−−−−−−−→Ser−Met−Gly−Ala−Phe +
Arg−Leu−Ile
Trypsin hydrolyzes the peptide bonds that follow positively charged residues, e.g.
Lysine and Arginine. As an example:
Ser−Met−Gly−Ala−Phe−Arg−Leu−IleTrypsin−−−−→Ser−Met−Gly−Ala−Phe−Arg + Leu
−Ile
If only two fragments are produced by the cleavage reaction, then it is straightforward to
reconstruct the sequence using the known sequence of the original protein. However if
the original protein is cleaved into three or more fragments, then it is not possible to
determine the correct order of fragments using a single cleavage agent. Multiple
overlapping fragments have to be used to determine the correct ordering, as illustrated
below.
Sequence Determination
Ala-Gly-Met-Ser-Thr-Gly-Val-Val-Lys-Gly-Ser-Ala-Phe-Leu
In this example I have assumed that 6 cycles of Edman degradation are possible.
After that, impurities and side reactions prevent the reliable identification of the amino
acid. Note that in practice 30-100 cycles can be accomplished, giving the sequence of
the first 30-100 residues of the protein.
A: the first six cycles of edman degradation produced, Ala, Gly, Met, Ser, Thr, and Gly, in
that order. therefore the amino terminal sequence is:
Ala-Gly-Met-Ser-Thr-Gly
B: A new sample of the peptide was treated with CNBr. The two peptides (CNBr-1, CNBr2) that were produced were isolated and each was subject to Edman Degradation, giving
the following sequences (The residues in bold were determined by Edman degradation,
the remainder of the peptide is present, but not detectable).
CNBr-1: Ala-Gly-Met
CNBr-2: Ser-Thr-Gly-Val-Val-Lys-Gly-Ser-Ala-Phe-Leu
C: A new sample of the peptide was treated with Trypsin. The two peptides (Trp1, Trp2)
that were produced were isolated and each was subject to Edman Degradation. The
sequence of these two peptides was:
Trp1: Gly-Ser-Ala-Phe-Leu Trp2: Ala-Gly-Met-Ser-Thr-Gly-Val-Val-Lys
Strategy: Find overlaps between fragments obtained with different cleavage reagents and
use these overlaps to correctly pair the peptides obtained from one sequencing reaction.
The overlaps can be readily identified by finding a cleavage site in a peptide that would
be cut by another cleavage reagent (e.g. Trypsin) and then identifying the correct
fragment based on the expected amino-terminal sequence. For example, the sequence
from the Edman degradation of the intact peptide contains a Met residue, so you would
look for overlaps between the intact sequence and the two CNBr fragments:
Ala−Gly−Met−Ser−Thr−GlyAla−Gly−Met Ser−Thr−Gly−Val−Val−Lys
CNBr−1
CNBr−2Combine to give:Ala−Gly−Met−Ser−Thr−Gly−Val−Val−Lys
The partial sequence above contains a Lys residue. Therefore one of the Trypsin
fragments should start with a Gly residue. Of the two Trypsin fragments, Trp1 starts with
a Gly residue. Therefore Trp1 must be the second fragment, allowing completion of the
sequence:
Ala-Gly-Met-Ser-Thr-Gly-Val-Val-Lys Gly-Ser-Ala-Phe-Leu
Before discussing secondary structure, it is important to appreciate the
conformational plasticity of proteins. Each residue in a polypeptide has three bonds
connecting mainchain atoms that are potentially free to rotate. The conformation of the
atoms involved in these bonds describes the secondary structure of the protein. The
rotation angle about a bond is referred to as a torsional angle. A torsional angle defines
the relative orientation of four atoms in space and it is the angle between two planes. The
torsional angle between the N and Cα bond is shown below.
Conformation of the Ci-1 - N (Peptide bond). The four atoms that make up this bond are
planar due to the hybridization properties of the carbonyl carbon and the nitrogen (both
sp2). In addition, free rotation about the bond is not possible since the pz orbitals of
oxygen, carbon, and nitrogen form a delocalized system. Rotation about the peptide bond
would break the interaction between the pz orbital of the nitrogen and carbon atoms, and
is therefore unfavorable. The peptide bond is said to be a "partial double bond".
The atomic orbitals of the mainchain atoms are shown. The carbonyl carbon uses an
sp2 hybrid orbital to bond to the carbonyl oxygen and the nitrogen. Consequently, the
oxygen, carbon, and nitrogen all lie in the same plane. Since the nitrogen is also uses
sp2 hybrid orbitals, the amide hydrogen is also on the same plane. The second bond
between the carbonyl carbon and oxygen is formed by overlap of the p z orbitals. The
nitrogen pz orbital is also in a favorable position to overlap with the carbon pz orbital.
Consequently, the electrons in all three pz orbitals form a delocalized system resulting in
a partial double bond between the carbonyl carbon and the nitrogen.
Cis and Trans Peptide Bonds: Two possible orientations of the peptide bond that
maintain a favorable pzinteraction between the carbon and nitrogen are possible. They
are related by a 180o flip of the peptide bond, generating the trans form and the cis form.
For all peptide bonds, the trans form is more stable than the cis form. The higher energy
of the cis form is due, in part, to overlap between the α-protons on adjacent residues (nonproline) or between α and δ protons in the case of proline. Use the buttons on the Jmol
images below to highlight these overlaps.
Ala-Ala Trans and Cis Peptide Bonds.
In the case of linkages between non-proline residues, the unfavourable
overlap of the mainchain (and sidechain) atoms makes the cis form less
stable by about 15 kJ/mol, giving a ratio of trans to cis of 1000:1. The
molecular crowding of the α-hydrogens in the cis form is evident in the
Jmol image on the far right.
Ala-Pro Trans and Cis Peptide Bonds.
Both the trans and cis form of the
peptide bond result in overlap of atoms, raising the energy of the trans such
that it is only ~4 kJ/mol lower than the cis form. In the trans form the
molecular crowding involves the α-hydrogen of the preceding residue and
the δ-hydrogens on the proline. In the cis form it is the two α-hydrogens.
N - Cα & Cα - C Bonds: The torsional angles associated with each of these bonds are
defined as:
Φ (Phi), the bond between N and Cα
Ψ (Psi), the bond between Cα and C.
There is free rotation about both of these bonds. Not all torsional angles are equally likely.
There are three torsional angles that are more stable than others. They are related to
each other by a 120o rotation about the bond. The three stable positions are more easily
seen with a simpler molecule, such as 1-chloro-1-fluoroethane. Note that the three stable
conformations minimize the interaction between the atoms on each carbon by maximizing
their distance from each other. These conformations are also stabilized by favorable
interactions between the molecular orbitals in the molecule.
In the case of amino acid residues in a protein, the presence of the bulky atoms on the
sidechain restricts the possible phi and psi angles of a residue to 3 pairs of values that
are relatively low in energy:
Three possible phi psi angles of a
residue
Φ=-60o, and Ψ=-45o
Φ=+60o, and Ψ=+45o
Φ=-120o, and Ψ=125o
Secondary Structures
Proteins consist of a linear chain of amino acids, with each amino acid
representing a build block. The shape of each block depends on the Φ and Ψ angle of
each residue. In regular secondary structures these angles are the same for each
residue, and thus the shape of each building block, or amino acid, is the same.
If a series of identically shaped objects are laid end-to-end they will form some type of
geometrical structure. In two dimensions there are two possibilities, either a straight
chain or some type of circle (which may or may not be closed). A straight chain occurs if
the is no curvature in the block, while a circle will result if there is any degree of
curvature. The radius of the circle is related to the degree of curvature.
In the case of three dimensional building blocks that have some degree of
curvature on both faces, the two dimensional circular structure becomes a helix. If the
building block is a perfect rectangular prism, then the structure will remain linear.
Given the possible values of Φ and Ψ angles, many different shapes of the amino acid
building block are possible and therefore many different three dimensional structures
are possible. Only two are commonly observed in proteins, the right-handed alpha helix
and beta-structures. A left-handed alpha helix is also stable, but relatively rare . These
conformations are stable because they:
 Maximize mainchain hydrogen bonding
 Maximize van der Waals interactions of mainchain atoms.
 Minimizing steric clashes of mainchain and sidechain atoms
In all cases these secondary structures of proteins have characteristic values of the Ψ
and Φ torsional angles that are the same for each residue within a particular secondary
structure. In all cases each peptide bond is rigid and planar in the trans conformation.
Alpha helix (Φ=-60o, and Ψ=-45o)
1. Dimensions, geometry, & H-bonds:

3.6 residues/turn

pitch = 5.4 A/turn

rise/residue = 1.5 A
2. H-bonds || to helix axis.
3. Sidechains point outwards
4. Right handed form is more stable
Beta Structures (Φ=-120o, and Ψ=125o
These particular phi and psi torsional angles generate a building block that is
almost a perfect rectangular prism. Consequently, beta structures consist of straight, fully
extended, strands of linked amino acids which are called beta-strands.. Due to the
extended nature of the polypeptide it is not possible to form mainchain hydrogen bonds
between adjacent residues. Consequently H-bonds are between adjacent strands and
perpendicular to the direction of the strand. Two or more beta-strands make a beta-sheet
. The amino acid sidechains are also directed outward from the strand and
alternate between pointing upwards and downwards with respect to the plane of the
sheet. In a beta-sheet there are two possibilities for the relative orientation of the individual
strands. The strands can run in the same direction, generating a parallel beta-sheet, or
they can run in opposite directions, generating an anti-parallel beta sheet. In both types
of sheet, the beta-strands are parallel to each other.
Ramachandran Plots
The phi and psi angles for each residue in a protein are neatly summarized in a
Ramachandran plot. The horizontal and vertical axis represent the phi and psi angles of
a peptide residue. A single point in the plot represents the phi and psi values of one
residue. The contour lines surround regions of low energy and correspond to β-strand,
αR helix, or αL helix secondary structures. High energy regions result from unfavorable
van der Waals interactions between sidechain atoms.
Non-regular secondary structures
Sharp turns in proteins, particularly at the ends of beta-strands and beta-hairpins, have
a characteristic geometry and sequence. As with other forms of
secondary structure, these turns are stabilized by hydrogen
bonding and favorable van der waals interactions. These turns often
contain Glycine at position 3 (R3), because of its unique
conformational properties. Use the Jmol activity to the right to
determine the hydrogen bonding pattern in a type II turn.
Download