PROTEIN SEQUENCING First Sequence • The first protein sequencing was achieved by Frederic Sanger in 1953. He determined the amino acid sequence of bovine insulin • Sanger was awarded the Nobel Prize in 1958 1 I. Strategy • Determine number of polypeptide chains (subunits) • Determine number of disulfide bonds (inter- and intrachain) • Determine the amino acid composition of each polypeptide chain • If subunits are too large, fragment them into shorter polypeptide chains • Sequence each fragment using the Edman degradation method • Complete the sequence by comparing overlaps of different sets of fragments II. End-group Analysis • • Number of chains can be determine by identifying the number of N- and C-terminal. N-terminal analysis – – – • Dansyl chloride Phenylisothiocynate (PITC)/ Edman reagent Aminopeptidase C-terminal analysis – carboxypeptidase 2 N-terminal Analysis with Dansyl Chloride • Main reagent: 1-dimethyl aminophthalene-5-sulfonyl chloride (dansyl chloride) • Dansyl poplypeptide chain is prepared • Acidic hydrolysis liberates all amino acid and the Nterminal dansyl amino acid • Amino acids are separated • Fluorescence of the dansyl amino acid is detected • Type of aa is obtained from comparison with standard dansylated amino acids N-terminal Analysis Edman (Degradation) • Nucleophilic attack on phenyl isothiocyanate (PITC), the Edman reagent, under mild alkaline conditions (Nmethylpiperidine/water/ methanol) • Formation of a phenylthiocarbamyl derivative (PTC-peptide) 3 N-terminal Analysis Edman (Degradation) • Anhydrous trifluoro acetic acid (TFA) is used to cleave the terminal amino acid in the form of a thiozolinone derivative leaving the other peptide bonds intact • The thiozolinone (TZ) derivative is extracted in an organic solvent (e.g. N-butyl chloride) • Peptide cleaved carries a free amino terminus N-terminal Analysis Edman (Degradation) • The TZ is extracted into an organic solvent and treated with an acid (25 % TFA/water) to form phenylthiohydantoin (PTH) derivative • PTH is detected from UV absorption at 296 nm 4 N-terminal Analysis-Edman Degradation • PTH amino acid is separated from the other components by chromatography or electrophoresis • The terminal amino is identified according to retention time or mass • This sequence can be repeated to identify all amino acid in short peptide chains (40-60 amino acid long) Edman Degradation on Protein Sequencer Perkin Elmer Applied Biosystems Model 494 Procise protein/peptide sequencer http://www.biotech.iastate.edu/facilities/protein/nsequence494.html 5 Edman Degradation on Protein Sequencer By-products of Edman Degradation 6 N- and C-terminal Analysis-Exopeptidase Method • Exopeptidases cleave the terminal residue of a polypeptide chain • Aminopeptidases cleave the N-terminal residues • Carboxypeptidases cleave the C-terminal residues • Aminopeptidases and carboxypeptidases are highly specific, thus are of limited use due to slow rates and resistance of some amino to cleavage III. Disulfide Bond Cleavage • Disulfides are reduced to thiol with dithiothreitol (DTT) or 2mercaptoethanol • Thiols are treated with alkylating agents (e.g. iodoacetic acid) to prevent the re-oxidation during subsequent steps. 7 Protection of sulfyhydryl groups IV. Separation and Molecular Weight Determination of Subunits • Traditional Methods – SDS-PAGE, SEC, or RP-HPLC are used to separate the subunits after cleavage of disulfide bonds – Mw standards and a calibration curve are used to determine the molecular weights – The approximate number of amino acids can be estimated from the Mw of the subunit using 110 Da as the average molar mass for each amino acid • Recent methods – MALDI: more accurate and fast 8 V. Amino Acid composition • Strategy: – hydrolysis followed by separation and identification • Acid catalyzed hydrolysis – 6M HCl/ 100-120ºC/ 24 h (in oxygen free environment to prevent oxidation of SH groups) – Some residues are degrated under these harsh conditions • Base catalyzed hydrolysis – 4 M NaOH /100ºC/ 4-8 hours – Arg, Cys, Ser and Thr are decomposed and other amino acids are deaminated and racemized – Used mainly to determine Trp which is extensively degraded under acid catalyzed hydrolysis V. Amino Acid composition • Enzymatic hydrolysis – By exo- and endopeptidases – A combination of endo and exopeptidases must be used to hydrolyze all the peptide bonds • Separation – Individual amino acids in hydrolyzed mixture can be separated by RP-HPLC or CE and identified according to retention time • Increasing sensitivity – Pre- or post-column derivatization is used to increase sensitivity 9 Derivatization with OPA and MCE VI. Cleavage of Specific Peptide Bonds • Direct sequencing is applicable to peptides that have up to about 50 residues only. • Problems which occur after lengthy reactions – Incomplete reactions – Accumulation of impurities from side reactions • Solution: use enzymes to break down the polypeptide chain into shorter fragments – Proteolytic enzymes: endopeptidases and exopeptidases 10 Enzymatic Fragmentation • Trypsin – Trypsin is the most commonly used proteolytic enzyme. It cleaves at the C-end of positively charged amino acids (Arg and Lys) if the next residue is not a proline. – It is highly specific – Cleavage sites may be removed or added via derivatization to take advantage of the specificity of trypsin – Reaction times can be adjusted to limit proteolysis if there are too many Arg and Lys residues – Non-denaturing conditions can be used to limit proteolysis as well Trypsin Digestion 11 Derivatization of Cys for Tryptic Digestion Other Proteolytic Enzymes • Endopeptidases – Pepsin; cleaves at the amino end of Phe, Tyr, Trp the previous residue is not a proline – Chymotrypsin: cleaves at the carboxyl end of Phe, Trp, Tyr if the next residue is not proline – Endopeptidae GluC: cleaves at the carboxy end of Glu • Exopeptidases – Leucine aminopeptidase: cleaves rapidly N-terminal leucine aa. Does not cleave N-terminal proline – Aminopeptidase M: cleaves all N-terminal residues – Carboxypeptidase A: cleaves all except Arg, Lys, and Pro • Especially efficient for aa with bulky aliphatic and aromatic side chains • Does not cleave if the next residue is Pro – Carboxypeptidase B: cleaves C-terminal Arg and Lys if the next residue is not Pro – Carboxypeptidase C: cleaves C-terminal residues 12 Chemical Fragmentation Methods • Cyanogen bromide (CNBr) specifically cleaves Met residues at the C-end forming a homoserine lactone 1. 3. 2. 4. Sequence Determination • Separate segments by chromatography or electrophoresis and sequence fragments individually • Edman degradation is the method of choice – Fully automated systems which use the Edman degradation methods are available commercially (Sequenator) • In the sequenator the protein is immobilized through bonding to a solid support or by adsorbing it onto an inert glass frit. • Controlled amounts of reagents are injected by a pumping system • The thiozolinone is transferred to a conversion chamber for hydrolysis to the PTH amino acid • The final product, the PTH amino acid, is pumped into an HPLC column for on-line analysis – 1 hour analysis time is possible for 50 amino acid residues 13 The solid-phase matrix-the Merrifield resin Edman degradation 14 Ordering of Peptide Fragments • Compare amino acid sequence of one set of peptide fragments with the sequence of a second set of fragments obtained using different cleavage points Determination of Disulfide Bond Position • Digest polypeptide chain(s) • Run 2D gel of mixture of fragments using same conditions in both dimension • After separation in the first dimension, the matrix is exposed to performic acid which cleaves all possible disulfide bonds • Separation in the second dimension is performed – Fragment without ss bonds will be positioned along the diagonal of the matrix – Fragments linked by S-S bonds will produce off diagonal spots – The disulfide linked fragments can be extracted from the gel and sequenced 15 Protein Sequencing by Mass Spectrometry • Digest protein • Obtain MALD TOF mass spectrum of digest • Use online database to match fragments patterns with those in the data base • Obtain sequence of fragments by performing MS/MS 16