Biochemistry 462a - Proteins: Primary Sequence Reading - Chapter 5 Practice problems - Chapter 5 - 2,5; Proteins extra problems Levels of Structure The function of a protein can only be understood in terms of its structure. The three dimensional structures of many proteins have been determined and from these structures a few general principles can be derived. Protein structure is discussed in terms of four levels of organization: Primary Structure is the amino acid sequence of its polypeptide chain(s). Every protein has a unique amino acid sequence. Secondary Structure is the spatial arrangement of the polypeptide backbone, ignoring the conformation of the sidechains. Tertiary Structure is the three dimensional structure of the entire polypeptide. Quaternary Structure refers to the three dimensional structure of proteins that are composed of two or more polypeptide chains, called subunits. Primary Structure This is the primary structure of bovine insulin, which is composed of two polypeptide chains (A and B). The two polypeptide chains are joined by two interchain disulfide bonds - the A chain also contains an intrachain disulfide bond. Determining the amino acid sequence of a protein used to be a very laborious and timeconsuming process involving chemical and enzymatic degradation. Today, the amino acid sequence of proteins is usually determined from the nucleotide sequence of the gene - a relatively simple and rapid process. The amino acid sequence of the same protein from many sources, e.g., cytochrome c, shows that some amino acid residues are conserved among all the proteins, whereas others are not conserved. Such an analysis provides valuable information about amino acid residues that may be essential for a proteins function. 1 The importance of amino acid side chains: Real Life Example - sickle cell hemoglobin Hemoglobin is the oxygen transport protein in blood. It is a tetramer containing two and two chains. Hemoglobin exists in two states: an oxy form and a deoxy form. Several hundred mutant hemoglobins are known to exist. In most, a single amino acid replacement occurs in either the or chain of normal Hb A. Many of these changes cause no known effect, but several lead to pathologies associated with abnormal O2 transport. In sickle cell hemoglobin, HbS, there is a single amino acid replacement of a Val for Glu at position 6 of the chain. This seemingly innocuous change places a hydrophobic sidechain on the surface of the protein. In the deoxy conformation the Val sidechain of a chain in one Hb binds to the chain of another Hb. This leads to polymer formation and precipitation of the deoxy Hb. This leads to red cell lysis and anemia. Amino Acid Composition The amino acid composition is a fundamental characteristic of any protein. Hydrolysis of the protein in acid releases the amino acids that are then quantitated using ion exchange chromatography in an automated amino acid analyzer. The amino acid peaks are detected using Ninhydrin, which reacts with the free amino groups of amino acids to produce a purple color. 2 Amino Acid Sequence The amino acid of each protein is unique and determination of the amino acid sequence is an important part of characterizing proteins. Today, most protein amino acid sequences are deduced from the sequence of its gene, because sequencing DNA is much easier than sequencing proteins. However, determination of protein sequences is still an important tool in Biochemistry. We use an automated process based on the Edman reaction and chromatographic techniques to identify the PTH-derivative. Although these reactions proceed to > 90%, eventually (about 25 cycles) it becomes difficult to detect the newly released product. So a single Edman degradation is not able to determine the entire sequence of a protein. 3 What is needed is a new amino terminal. This is accomplished by degrading the protein with a proteolytic enzyme, such as trypsin, which generates a number of peptides that can be separated and sequenced. Trypsin cleaves peptide bonds at the carboxyl of Lys or Arg residues, as illustrated below. Chymotrypsin cleaves peptide bonds at the carboxyl of Phe, Trp or Tyr residues. Other proteases have different specificities, which allows one a variety of ways to fragment the protein under investigation. The problem, of course, is that once the proteolysis has been accomplished and the peptides separated, you don't know how they are ordered in the original protein. Reestablishing the order is the big problem in protein sequencing. Mass Spectrometry Recently mass spectrometry has become an important technique in peptide/protein chemistry. Mass spectrometers consist of three basis parts An ion source that creates charged molecules in the gas phase a mass analyzer that uses a physical property, e.g., time-of-flight (TOF), to separate ions a detector. Two important methods are used to create protein ions: 4 In matrix-assisted laser desorption ionization (MALDI) ions are created by using a laser to excite proteins in a crystalline matrix. MALDI is particularly suited for determining the molecular weight of proteins, often to accuracies of a few parts per million. The spectrum shown above illustrates the molecular masses of several peptides in a mixture. In electrospray ionization (ESI) ions are created by applying a potential to a flowing liquid. This causes the liquid to spray and protein ions to be created. This method can also be used to measure molecular weight, but is most powerful when used in tandem MS/MS. A tandem mass spectrometer combines two mass analyzers with a method to energetically activate ions. In the first spectrometer a particular ion is isolated from all other ions that enter the mass analyzer (as marked above), dissociated, and the m/z values of the dissociation products determined in the second mass analyzer. The dissociation process causes covalent bonds to fragment. In the case of peptide ions, fragmentation processes predominate at or around the amide bond, creating a ladder of ions that is indicative of an amino acid sequence, as illustrated below. 5 Sequence Homology Once the amino acid sequence of a protein has been determined, there are powerful computer programs that can be used to determine if the sequence is similar to other proteins. Such a search might give the results shown below. #1 MKRTYQPNRRKRSKVHGFRARMSTKNGRKVLARRRRKGRKVLSA #2 MKRTWQPSKLKHARVHGFRARMATKNGRKVIKARRAKGRVRLSA #3 MKRTYQPSRVKRNRKFGFRARMKTKGGRLILSRRRAKGRMKLTV #4 MKRTFQPSILKRNRSHGFRTRMATKNGRYILSRRRAKLRTRLTV #5 MKRTYQPSKQKRNRTHGFRARMATKNGRQVLNRRRAKGRKRLTV #6 TKRTFQPNNRRRARKHGFRARMRTRAGRAILSARRGKNRAELSA #7 SKRTFQPNNRRRAKTHGFRLRMRTRAGRAILANRRAKGRASLSA #8 GKRTFQPNNRRRARVHGFRLRMRTRAGRSIVSDRRRKGRRTLTA The degree of identity between the sequences can be used to construct a distance matrix, which indicates how closely related the different sequences are. Here is one for cytochrome c from a variety of species. 6 Based on such a distance matrix, one can then construct a phylogenetic tree, as illustrated here for cytochrome c. Genomics and Proteomics There is a great of activity directed towards determining the complete sequence of the human genome (genomics) and several other genomes are also being sequenced, e.g., yeast has been done and the fruit fly Drosophila melanogaster will be finished soon. One the complete sequence is finished, what to do with the data. One thing is to figure out what the proteins encoded by the genome are and what they do (proteomics). In many cases we can deduce the nature of the protein by homology to other proteins already sequenced, but in several cases (maybe >30%), we have no clue. We can use biotechnology techniques to produce the protein, which can then be purified and studied in order to try to deduce its function. One important approach is to determine its three dimensional structure, which may give a clue to its function. The future of protein biochemistry is indeed exciting! 7