Protein Sequencing Research Group: Results of the PSRG 2012 Study Terminal Sequencing of Standard Proteins in a Mixture Year 1 of the 2-year Study Current PSRG Members Henriette Remmer (Co-Chair) Jim Walters (Co-Chair) Robert English* Pegah Jalili* Viswanatham Katta Kwasi Mawuenyega Detlev Suckau Bosong Xiang Jack Simpson (EB liaison) * new members added in 2011 University of Michigan Sigma-Aldrich University of Texas Medical Branch Sigma-Aldrich Genentech, Inc Washington University School of Medicine Bruker Daltonics Monsanto, Co. United States Pharmacopeia PSRG 2012/13 – Study Background and Design Status of Terminal Sequencing : In the midst of a technology transition from classical Edman sequencing to mass spectrometry (MS) based sequencing Both technique have varied strengths and weaknesses and both have a role in biochemical research. With a complimentary role realized, we attempt to push the capabilities of the various sequencing techniques, namely terminal sequencing of proteins in mixture Concept of the 2012 Study- Terminal Sequencing of Proteins in a Mixture: Sequencing proteins in a mixture requires separation of proteins prior to analysis Edman Sequencing : SDS-PAGE and electroblotting prior to analysis – well established in most core facilities MS based sequencing: LC separation necessary prior to analysisnot well established in most core facilities => PSRG designed a 2-year study YEAR 1: Terminal sequencing and identification of three separated standard proteins YEAR 2: Same three proteins distributed, this time in mixture PSRG 2012 Year 1: Study Objective To obtain N-terminal sequence information on three standard proteins supplied as separated samples. 2011 Study Design – The Samples Protein Name Amounts Provided (pmol) N-terminally blocked? Fusion Protein? Comments BSA 1mg No No reference protein/ calibrant Protein A 3x 100 Yes Yes Fusion protein with blocked N-terminus Endostatin 3x 100 No No Contains two Nterminal variants Participants were asked to analyze the samples for terminal sequencing using any technology available Participants obtained all three proteins with ID in sufficient amounts to sequence each protein utilizing all three technologies. Feasibility of analysis had been validated by PSRG members. Participants also filled out a survey, all responses were kept anonymously Participation and Survey results 25 laboratories from 12 countries requested samples for Edman sequencing and most of the labs (23) also for MS sequencing. 14 of the 25 participating laboratories (56%) completed the survey. 7 of the 14 labs utilized Edman sequencing , 6 top-down MS and 6 bottom-up MS. Out of 14 respondents, 9 labs analyzed the reference protein BSA, 8 correctly determined the N-terminus 13 labs analyzed Protein A , 5 correctly determined the N-terminus 14 labs analyzed Endostatin, 12 labs correctly determined the N-terminus , only 7 identified the presence of the second N-terminus Participation and Survey Results 15 Number of Respondents Correctly determined the Nterminus Unable to determine the N-terminus 10 5 0 BSA Protein A Endostatin Survey Response Results What types of analyses did you perform on the sample? 60.0% 50.0% 40.0% 30.0% 20.0% 10.0% 0.0% Purification and separation method before analysis N-Terminal Techniques: Edman Degradation Edman Workflows PSRG 2012 Samples Used sample as Provided (5) Shimadzu PPSQ-33A SDS PAGE – blotting on PVDF (2) blotting on PVDF (1) ABI Procise 4 - 494 HT’s 1 – 492 cLC 2 - 494 cLC C10 Edman sequencing Protein A PROTEIN A- FUSION PROTEIN- N-TERMINUS BLOCKED De-blocking (PGAP) 100 pmol Polybrene-precycled glass fiber filters ABI Procise Biosystems Model 494HT Sequence 1 C10 MMet L R P V E T P - L R P V E T P Edman sequencing of Endostatin H2O with 0.1 % TFA blotting on PVDF A00 Shimadzu PPSQ-33A Initial Yield: 36.95 % Repetitive Yield: 84.98 % Probability 1: position 4 Proline to Arginine Probability 2: position 7 Histine to Glutamine Edman sequencing of Endostatin A00 Sequence 1 D F Q P V L H L V A L N S P L A00/Vaiants 1 D F Q P V L H L V A L N S P L Sequence 2 H S H R D F Q P V L H L V A L A00/Variant 2 Information about the sequence: SwissProt output Sequence Verification: with Blast P R Q Summary of N-terminal sequencing result Sample Description Lab ID BSA Y20 D T H K S E I A H R F K D L G E E H F K G L V L I A F S Q Y L Q Q X P F D E H V K L V N C10 D T H K S E I A H R F K D L G E E H F K G L V L I A F S Q Y N32 D T H K S E I A H R F K D L G E E H F K G L V L I A00 D T H K S E I A H R F K D L G E E H F K G L V L I A F S Q Y Protein A Y20 Y20 C10 Endostatin Seq. 1 Endostatin Seq. 2 Amino acid sequence F L R P V E T P T R E I K K L D G L A Q H D E A Q Q N A F Y Q V L N M P N M F L R P V E T P T L R P V E T P T R E I K K L D G L A Q H D E A Q Q N A F Y Q V L N32 X L R P V E T P X R E I K K L A00 M L R P V E T P T R E I K K L D G L S10 X L R P V E T P T R E I K K L D G L A Q H D E A Q Q N A V00 F L R P V E T P T R E I K K L D G L A Q H D E A Q Q N A F Y Q V L N M P N Y20 D F Q P V L H L V A L N S P L S G G M R G I R G A D F Q X F Q Q A C10 D F Q P V L H L V A L N S P L S G G M R G I R G A D F Q C F Q Q A R E20 D F Q P V L H L V A L N S P L S G G M R G I R G A D F Q C F Q Q A R A V G L A G T N32 D F Q P V L H L V A L N S P L S G G M R G I A00 D F Q P V L H L V A L N S P L S10 D F Q P V L H L V A L N S P L S G G M RG Y20 H S H R D F Q P C10 H S H R D F Q P X L H X X A L N X X X S G G M E20 H S H R D F Q P V L H L V A L N S P L S G G M R G I R G A D F Q C N32 H S H R D F Q P V X H X V A L N S PSRG 2011 Edman Conclusions & Observations Edman sequencing allows for direct determination of the protein’s N-terminal sequence. All lab returned N-terminal data which correlate well with the published protein sequences It can produce the data with and without separation (SDS PAGE and chromatography) No C-terminal data was produced with Edman. If the protein N-terminally blocked, the reaction will not proceed for most but not all modifications. The reagents for Edman sequencing are very expensive N-Terminal Techniques Overview: MS Techniques Mass Spectrometry Methods Used Top-Down Sequencing (no digests) ISD, T³: AB Sciex 4800 MALDI-TOF/TOF MS, ISD, T³: Bruker Ultraflex MALDI-TOF/TOF MS, ETD,CID: Bruker maXis 4G UHR-QTOF Only Top-Down N-term results were returned. Some participants used Bottom-Up MS as validation step Bottom-Up MS/MS (digests) MALDI-TOF/TOFs: AB/Bruker ESI-Orbitrap: Thermo Top-Down Experimental Sample Separation Top-Down Instrumentation ISD ISD/T³ HPLC 0.1% TFA MeOH/H2O/HOAc 6M GndHCl Various organic/H2O/acid Direct infusion As provided Bruker Ultraflex Bruker UltrafleXtreme Bruker Autoflex speed ISD/T³ Agilent 1200 Triversa Nanomate AB Sciex 4800 ETD CID Bruker MaXis 4G Software used for MS Top-Down Analysis BioTools 3.2: Sequence-tags, automatic de-novo sequencing, trigger Mascot TD searching, result visualization, terminal assignments, TD report generation (Bruker) Mascot 2.3: TD and BU Database searches (Matrix Science) BLAST/MS-BLAST: Protein identification based on sequence tags (NIH, Harvard/EMBL) ISDetect: Sequence-tags, semi-automatic de-novo sequencing, result visualization (Genentech, Y Gan et al, in prep. ) The Top-Down MS Standard Analysis Strategies MW Determination: Check Sample Quality + Final QC ETD/ISD: obtain internal sequence Tags D T H K S E I A H R F K D L G E E H F K G L V L I A F S Q Y L QQ C P ID Protein: e.g. Mascot search Extend Sequence towards N-terminus (and C-term alike) Compare with obtained protein sequences incl. PTMs) T³-Sequencing, i.e. MS/MS analysis of MALDI-ISD fragments Edman sequencing D T H K S E I A H R F K D L G E E H F K G L V L I A F S Q Y L QQ C P Problems: unknown terminal modifications (Sample B), fusion proteins (Sample B), ragged ends (Sample C) BSA ISD Spectrum in DAN matrix good calibrant for ISD Spectra PSRG123 Sample A: BSA, ISD+Edman following the basic strategy C10 BSA sequence Accession number: AAI02743 c-ions in the MALDI-ISD spectrum revealed the sequence from Arg10 -Tyr30. Edman sequencing provided Asp1 to Gly15 Data from the orthogonal methods were put together to obtain 30 residues of BSA sequence. FINAL SEQUENCE OBTAINED FOR BSA: 1 10 20 30 40 DTHKSEIAH RFKDLGEEHF KGLVLIAFSQ YLQQCPFDEH VKLVNELTEF… Coverage by Edman Coverage by MALDI-ISD Coverage by both Sample B Endostatin (donated by Sigma) issues: ragged N-term, C-term loss of K C-term K excised added C-term K excised Endostatin Annotated ISD Spectrum from on/off gradient Interfering component L36 Endostatin L36 HPLC chromatogram, separation of two variant, ISD of F1, F2 not assigned The recovery from the endostation sample might be lower than 100 pmol 100 pmol Myoglobin standard F2 F1 LC-separation detected the protein heterogeneity, removed polymeric contamination but reduced the sample amount and readout length UHR-QTOF MS analysis of Endostatin: Z10 2 Components Intens. x105 +MS, 1496.8469 1.0 1390.0011 0.8 1621.4171 0.6 1945.6003 1768.8184 0.4 1297.3352 1221.9913 0.2 0.0 1200 1400 1600 1800 2000 m/z In contrast to MALDI-ISD, the QTOF-ETD analysis takes place after precursor ion selection ETD Analysis of Endostatin, First Precursor: Mascot Database Search Result Simplest Use of Top-Down Data: Mascot Search Z10 TDS Analysis of Endostatin, First Precursor: Deconvoluted and Annotated ETD Spectrum c2 c9 Z10 c 26 TDS Analysis of Endostatin, First Precursor: Z10 Mass Accuracy of intact Protein Measured Monoisotopic mass Intens. Monoisotopic mass Theoretical 5 19433.8783 19433.8151 x10 4 Mass error 3.2 ppm Measured (black) Spectrum Simulated (red) Spectrum +MS, 0.5-20.4min, Deconvoluted (MaxEnt) C866H1340N250O250S6, 19433.8151 1+ 19444.8432 1+ 19443.8408 1+ 19446.8479 1+ 19442.8383 1+ 19447.8502 3 1+ 19448.8526 1+ 19441.8359 1+ 19449.8548 2 1+ 19440.8334 1+ 19450.8571 1+ 19439.8308 1+ 19451.8594 1 1+ 19452.8616 1+ 19438.8282 1+ 19453.8638 1+ 19437.8256 1+ 19436.8229 0 19436 19438 19440 19442 19444 19446 19448 19450 Precision MW allows to confirm proper N-term and C-term loss of Lysin 19452 19454 m/z Endostatin: TDS Sequence 1 PSRG123 Endostatin: TDS Sequence 2 PSRG123 If ISD spectral quality is good, both sequences can be directly read and N- and C-termini can be assigned from THE SAME SPECTRUM Rec. Protein A (donated by Repligen) Issues: N-term methylation, fusion site after residue 18 E.coli b-Glucuronidase SPA_STAAU C-term sequence does not match intact MW (nice challenge for Top-Down MS in the Future..) % 40 30 0 2340.0 20 2875.2 3512. 514 10 3410.4 Mass (m/z) 3945.6 1756. 740 1769.6 4480.8 2059.8 100 2308. 141 A 2295. 997 2180. 096 2109. 059 2070. 882 2041. 839 2029. 874 2013. 854 1995. 994 1938. 978 1912. 813 1885. 778 1835. 913 1823. 966 1809. 931 1797. 780 1774. 780 I/L 4848. 976 1710. 888 1740. 760 1724. 774 G 4705. 008 1641. 726 1626. 733 1656. 733 manual sequence generation 1056. 538 4700 Reflector Spec #1 MC=>BC=>SM5[BP = 1056.5, 9640] 4606. 874 1479.4 D 4535. 005 1582. 809 1599. 729 I/L 4420. 955 1547. 683 1568. 760 K/Q 4308. 395 1470. 682 1485. 682 K/Q 3965. 857 1189.2 1454. 726 1440. 693 1419. 635 1401. 607 1383. 634 1370. 625 1357. 615 1341. 656 1327. 631 1304. 601 I/L 3851. 577 A K/Q K/Q 1289. 606 1260. 549 1245. 541 1228. 526 1215. 572 1203. 549 1188. 525 1161. 564 1137. 325 1120. 359 1105. 392 1087. 428 1073. 412 1058. 383 1042. 506 1027. 501 1013. 273 998. 272 955. 500 E 3348. 530 E 3201. 401 70 3130. 361 971. 467 984. 284 50 1212. 624 934. 331 R 3016. 333 0 899.0 2888. 299 60 957. 500 60 4193. 676 50 H T 2760. 251 80 2689. 229 90 919. 333 70 2560. 194 10 944. 343 I nt ensi t y 80 2445. 186 20 2410. 025 30 903. 327 40 926. 387 % 100 3738. 646 I nt ensi t y ISD Spectrum Protein A (DAN) E20 90 9.6E+3 K/Q 2350.0 4700 Reflector Spec #1 MC=>BC=>SM5[BP = 1056.5, 9640] Mass (m/z) 789.6 D 5016.0 Protein A Identification E20 ISD spectrum for Samples #2 (Protein A) was manually interpreted by sequential subtraction of ions Resultant sequence: TRE[IL][KQ][KQ][IL]DG[IL]A[KQ] was Blasted against the Dayhoff public database (below) Only two sequences matched. Homology searching of the N-term Tag provided a) b-Glucuronidase, b) its N-terminally extended sequence, c) mass offset indicates N-term Methylation Protein A MS/MS ISD c-ion m/z 1056.538 T³-sequence analysis of c9 confirms N-term methylation E20 Protein A L36 MS/MS of N-terminal tryptic fragment M Ion a b a-17 b-17 y i L R P V E T P T R 1 2 3 4 5 6 7 8 9 10 1 M* L R P V E T P T R M* L R P V E T P T R M* L R P V E T P T R M* L R P V E T P T R M* L R P V E T P T R M* L R P V E T P T R 10 9 8 7 6 5 4 3 2 1 Met Leu 2 118.068 146.063 101.042 129.037 175.119 118.068 Arg Arg 3 231.153 259.147 214.126 242.121 276.167 86.096 Thr Pro 4 387.254 415.249 370.227 398.222 373.219 129.113 Pro Val 5 484.306 512.301 467.280 495.275 474.267 70.065 Thr Glu 6 583.375 611.370 566.348 594.343 603.310 72.081 Glu Thr 7 712.417 740.412 695.391 723.386 702.378 102.055 Val Pro 8 813.465 841.460 796.439 824.433 799.431 74.060 Pro Thr 9 910.518 938.513 893.491 921.486 955.532 70.065 Arg Arg 10 1011.566 1039.560 994.539 1022.534 1068.616 74.060 Leu Validation of assigned N-term methylation and glucuronidase sequence by Bottom-Up LC-MALDI-TOF/TOF analysis 1167.667 1195.662 1150.640 1178.635 1213.672 129.113 Met Protein A L36 Annotated ISD spectrum The N-terminal sequence is b-gluronidase fused with protein A. The N-terminal Methionine is methylated. The N-terminal aminoacids not confirmed by ISD was confirmed by MS/MS of the N-terminal tryptic fragment Results from MS Analyses Please look at poster ##?? For more details Lessons to be Learned from this Years Study Mass Spec Lessons.. 1. Top-Down with ETD or ISD provides reliable N-term sequences 2. Top-Down CID was most easily misinterpreted 3. Edman and Top-Down Complement each other very well: Edman for the first ~10 residues, Top-Down for the inexpensive extension of calls (e.g. through the fusion site of Protein A) 4. Validation of the N-term by either T³-sequencing or Bottom-Up works as well 5. Efficient use of Top-Down MS requires good software support 6. Bottom-Up was great to confirm N-term results but not to generate them 7. Use of protein HPLC resulted in shortened readouts 8. Protein A Successful analysis of the fusion required high experience 9. Endostatin ragged N-termini were recognized by those that determined the intact molecular weight(s) , detected heterogeneity by HPLC or Edman 10. Top-Down by ETD or ISD permitted the detection of the C-terminal removal of Lysine, intact MW determination allowed to validate the finding Next years ABRF-PSRG2013 study what's going to happen? Most likely, the same proteins will be provided again! But: provided as a stew in a single pot! Task: Isolate/separate them from the mixture Problem: SDS-PAGE works well for Edman, but it is difficult to extract intact proteins Hints: Protein LC needs to be established, to get to the next level! Always try to get intact MW information! Use high sample amounts as you loose a lot during LC The ABRF-PSRG Acknowledges the following Support Recombinant Protein A was obtained as donation from RepliGen (Waltham, MA) Endostatin was obtained as donation from SIGMAALDRICH (St Louis, MO) Steve Smith (University of Texas Medical Branch) and Larry Dangott (Texas A&M University) for Edman sequencing to provide reference data for this study. End Following slides are bonus material In-Source Decay (MALDI-ISD) MALDI-ISD • • • • • “pseudo-MS/MS” technique, no precursor selection ISD of protein in the MALDI plume at <nsec timescale (similar to ETD) Fragmentation due to radical transfer from matrix to analyte (Takayama, 2001) a,c- ions: N-terminus; y, z+2-ions: C-terminus – simultaneous sequencing TOF/TOF allows for T³-sequencing: MS/MS analysis of ISD fragments MALDI-ISD and T³-Sequencing Suckau & Resemann (2003) Anal Chem 75 ESI-ETD (Electron Transfer Dissociation) CID • • • • Collision with inert gas protein is internally heated globally it fragments in statistic process weak bond cleavages ETD • • • • • Collision with electron donating gas perturbates electronic structure locally resulting in local bond cleavages ETD fragments all bond (except Pro) for top down MS/MS of intact proteins with precursor ion selection ETD Measurement Cycle on QTOF 1. Precursor Ion Accumulation 2. Electron Transfer Reagent Addition 3. ETD Reaction 4. Fragment Ion Transfer and Detection 10 kHz Reaction Cell n-CI Source Tsybin et al. (2011) Anal Chem 83:8919 % 40 30 20 0 2245 10 2756 3267 Mass (m/z) 3620. 403 3777. 469 3778 3850. 483 4289 1979.8 D 2206. 957 2190. 964 2178. 956 2149. 937 R 4671. 200 2077. 880 2065. 896 2015. 827 initial manual interpretation 1364. 645 4700 Reflector Spec #1 MC[BP = 1364.6, 6319] 4600. 930 1924. 835 1936. 783 1993. 845 1893. 830 1908. 856 M 4444. 805 1709.6 4296. 463 4195. 564 1862. 823 1881. 833 1866. 837 1805. 811 1748. 792 1661. 772 G 4138. 845 4104. 137 1828. 800 1792. 781 1779. 805 1767. 799 1752. 791 1683. 752 1666. 741 1654. 734 1627. 730 1595. 709 1583. 707 1548. 701 1505. 684 1484. 663 1471. 685 1452. 631 G 4069. 048 3954. 714 1408. 621 1427. 620 S 3897. 704 1439.4 3799. 517 1250. 611 1386. 629 1371. 590 1358. 589 1313. 632 1295. 595 1272. 594 1248. 550 1234. 548 1202. 542 1171. 544 I/L 3727. 686 3694. 580 3649. 499 1137. 542 1155. 578 N 3571. 589 3501. 516 3403. 354 K/Q 3372. 525 1169.2 3244. 298 F 3097. 252 1057. 339 1121. 476 1102. 528 1089. 329 1073. 338 1060. 379 1034. 432 1014. 250 998. 242 981. 249 934. 298 919. 302 100 3334. 382 50 967. 460 949. 283 937. 306 I/L 2994. 245 70 2907. 070 0 899.0 2866. 211 60 2719. 152 K/Q 2604. 136 N 2533. 111 80 2511. 039 90 2476. 098 100 2380. 018 10 922. 312 50 2320. 021 30 903. 292 60 926. 314 I nt ensi t y 70 2334. 990 20 911. 340 % 40 2265. 994 I nt ensi t y ISD Endostatin (DAN): E20 90 6318.9 80 G Mass (m/z) 2250.0 4700 Reflector Spec #1 MC[BP = 1364.6, 6319] 1505.5 C 4800 Data base search for [IL]SGGMRGNR[KQ]DF[KQ]CF E20 Sequence from spectrum was found beginning at 1548.694, so we know there are a handful of residues preceding this seq Excerpt from COIA1_HUMAN Excerpt from COIA1_MOUSE Differences between human and mouse can be seen in the -2 position from the start of ISD sequence (ie. LNSPL in human and LNTPL in mouse) 10 0 9.0 72. 11 197. 12 295.4 010212_B23_10pmol_Endostatin_MSMS_2kV_1364.65 b5 581.8 Mass (m/z) 950. 37 y7 868.2 1049. 43 y9 1154.6 1356. 67 b10 1371. 48 110. 09 1364. 60 100 1377. 35 1347. 76 1333. 92 1317. 66 b9 1234. 51 b8 1120. 46 b7 974. 48 y6 837. 30 665. 30 b3 778. 36 648. 31 b4 587. 19 534. 24 488. 15 391. 13 b2 442. 21 364. 22 350. 14 326. 11 310. 17 280. 11 263. 09 251. 13 235. 10 223. 14 209. 10 183. 12 169. 13 155. 10 112. 11 70. 10 101. 09 40 115. 11 86. 12 P 129. 11 20 87. 11 30 84. 09 50 100. 11 60 60. 09 23. 03 I nt ensi t y 70 44. 09 30. 07 % To confirm N-terminus not covered in the ISD spectrum, MS/MS was performed on m/z1364.6 E20 4700 MS/MS Precursor 1364.65 Spec #1 MC=>BC=>NF0.7[BP = 1364.6, 360] 360.4 90 80 Immonium Ions K/Q I/L H 1441.0 Determination of Endostatin N-termini by E20 Edman degradation. - Major sequence matches CO1A1_HUMAN at position 1576. A second sequence was found from position 1572. Both sequences concur with the ISD findings. Edman sequencing detected the ragged N-term, ISD confirmed and extended it Largely manual analysis of ISD spectra made it difficult to extract full information 2012/2013 PSRG: Timeline of the 2-year study May ‘11 Data analysis Jun ‘12 May ‘12 ABRF 2013 Feb ‘13 Oct ‘12 Mar ‘12 Data analysis Samples sent Year 1 (2012) to participants Study announcement Discussed ideas for 2012 study. Agreement upon a study design Deadline for returning data Jan ‘12 Extended deadline for returning data Sep ‘11 Feb ‘11 ABRF 2011 Oct ‘11 Aug ‘11 Settled on the 3 standard proteins for distribution as separated proteinsi n year 1 of ABRF 2012 the study Distribution of proteins in mixture for year 2 of the Year 2 (2013) study Study announcement Comments…… un-reproducible recovery from the tube for Endostatin is a problem if one wants to optimize the setting or try to reproduce the data….. Thanks! PSRG. It was fun. Unelss I've missed something, the availability of the proteins in the public domain made this an easy project. Sample quality was very good! I thought the fusion Protein A solution was blocked? I obtained sequence matches to the protein B-Glucuronidase, either B-Glucuronidase is fused to Protein A and you were not successful blocking the protein or B-Glucuronidase is a contaminant…… It was very costly study for an Edman lab, (reagents).