Mass Spectrometry in Life Science: Technology and Data-Evaluation H. Thiele Bruker Daltonik, Germany Bridging Proteomics & Genomics Functional Genomics Proteomics Genomics MALDI-TOF Mass Spectrometry Proteome Analysis SNP Genotyping Investigation of protein diversity Search for genetic variations Identification No a priori knowledge about analyte MALDI-TOF MS Screening Analyte of known MW The Technology Mass Spectrometer for Biopolymer Research Principle of MALDI-TOF-MS Vacuum lock • all ions with Ekin = 1/2mv2 Vacuum system Linear flight tube Drift region Sample Analyte Acceleration plate molecules grids in matrix 20 to 200 spectra have to be added; total duration 2 to 20 seconds with 50 (200) Hertz Laser Ion detector Mass spectrum space/energy uncertainty Flight time m/z High resolution TOF-MS with Reflector 0V MALDI ion source Ion detector The reflector focuses ions of same mass but different Ekin (velocity) on detector; high resolution is obtained + kV Ion reflector HiRes mass spectrum Flight time m/z MS/MS by PSD MS/MS = fragment ion or tandem mass spectromentry PSD = Post Source Decay PSD by Reflectron TOF (Scheme) Electr. potential ion energy Metastable decay of molecular ions, energy is reduced according to mass ratio Adjustment of voltages Segment 1 Segment 2 Segment 3 Segment 4 E = 1/2 mv2 v=const. eg. if M+ = 1000, m = 500 has 4 keV m = 100 has 0.8 keV m =25 has 100 eV Source Reflector TOF-MS/MS by PSD Manual operation: 20 – 40 minutes; automatic operation: 5 – 10 minutes Adjustment of voltages per daughter ion spectrum Weaker Weak Weaker field field field field (100 acquisitions in each segment) Strong MALDI ion source Parent ion selector Ion reflector Ion detector The daughter ion spectrum can only be measured in segments which have to be pasted together. 10 - 15 segments are necessary. Daughter ion mass spectrum 4 3 2 1 In proteomics, many proteins have to be separated and analysed fast to avoid degradation Regarding structure information, MALDI MS/MS appears to be optimal, but PSD is much too slow ! Consequence: Development of a fast MALDI MS/MS instrument ! MALDI TOF/TOF with post-acceleration by potential LIFT TOF/TOF with LIFT (Scheme) All fragment ions can be analyzed simultaneously, Electr. potential no segmenting necessary ion energy 1. TOF 2. TOF Potential is switched when ions are in LIFT Decaying ions, energy reduced, low speed Source Even low mass ions have high energy, good for detection LIFT Reflector TOF -MS/MS with post-acceleration by LIFT LID Potential LIFT for post acceleration MALDI ion source Parent ion supressor Parent ion selector Collision Cell (CID) Ion detector MS/MS spectrum of daughter ions 1 to 200 spectra needed; is measured in a single acquisition; 1 to 10 seconds only no pasting of segments; low sample consumption, with 20 Hertz laser high speed, high sensitivity Ion reflector Daughter ion mass spectrum Data Evaluation Goal : Identification of Proteins (sequence of amino acids) and Protein modifications Method : – Fragmentation of proteins / peptides resulting in PMF / PFF spectra – Detection (annotation) of the masses of the fragments – Identification by database searches Problems to be solved by Bioinformatics - Detection of peaks with low signal/noise ratio - Identification (mass, area, intensity) of (overlapping) isotopic patterns - Score the results - Detection of multiple charges (TOF spectra z = 1,2) nominal mass Detection of protonated molecular ion [M+H]+ average mass monoisotopic mass Isotopic resolution Isotopic pattern of peptides 12C 1 14N 16O 32S+ : 93 H146 24 24 monoisotopic 12C 1 14N 15N 16O 32S+ : 93 H146 23 24 12C 93 1H 14N 16O 33S + 146 24 24 12C 13C 1H 14N 16O 32S+ 92 146 24 24 8.1%, m=2094.0455 : 0.7%, m=2094.0478 : 88.9%, m=2094.0517 12C 1 14N 16O 17O 32S+ : 93 H146 24 23 1 0.9%, m=2094.0526 12C 1 2 14N 16O 32S+ : 93 H145 H 24 24 1.4%, m=2094.0547 Deisotoping: Assigning monoisotopic masses SNAP approach: • Peak selection - Damping of chemical noise using FFT filtering Baseline correction noise calculation peak search • Iterative search for isotopic patterns – Analysing the largest peaks first – Alignment of patterns using peak list heuristic and FFT deconvolution – Nonlinear fit using asymmetric line shape – Subtraction of analysed patterns • Reevaluation – Fit of intensities of overlapping patterns, optional addition of ICAT masses – Calculation of Quality Factor SNAP : Regularized FFT Deconvolution Uncertainty of mean peptide isotopic distribution SNAP : Nonlinear Fit Local optima for least square fit: - 2 Exponentially modified gaussians for asymmetric line shapes: SNAP : Quality Factor Idea: Get a value for the quality of a pattern which can be used in favor of S/N or intensity for selecting the “best” peaks 2 Area/Width Basic Scoring Fuzzy Scoring Quality factor Mean deviation , for all patterns Kind of Spectrum/ Instrument SNAP : Use Case To monoisotopic masses From overlapping peak groups Wavelet Methods for Denoising Proteomics Spectra Denoising by Hard Thresholding Wavelet Transform Hard Thresholding Inverse Wavelet Transform Scale - adaptive Thresholds Preservation of Position, Shape and Amplitude of major Peaks Denoising by Hard Thresholding Further Developments " Baseline Correction " Deconvolution of Isotopic Patterns " Scale-Energy Parameters for enhanced Clustering Charge Deconvolution : Without Isotopic Resolution Charge states for ESI Different m/z peaks of Equine Apomyoglobin Protein Protein Z = 15-70 Peptide Z = 1,2,3,4 Small molecules Z = 1 MW is calculated from m/z differences between adjacant peaks by deconvolution software (result see inlet). M16+ Related Ion Deconvolution Peak Picking m/z ; intensity Deconvolution envelope; distances Result Z + MW 2.5 [M+zH]z+/z 16950.584 M15+ M17+ 998.1 2.0 M 1130.7 1.5 M14+ 1211.5 1.0 M18+ 943.0 16930 M19+ 0.5 1304.7 M20+ 893.7 849.1 16950 16970 M12+ 1413.6 m/z 800 900 1000 1100 1200 1300 1400 Charge Deconvolution: Isotopic Resolution For isotopically resolved patterns the charge state and the mass can be determined from a single pattern. (M+5H)5+ d (m/z) =0.2 u (M+4H)4+ d (m/z) =0.25 u 1148 1434 Problems to be solved by Bioinformatics Get more accurate data Calibration Automatic „Smart“ Calibration Mass distribution of peptides Contaminants, self digestion External calibration spots Statistical References Internal Calibrants External Calibration • Automatic Control based on Automatic “Smart” Calibration external and internal data • Resulting Accuracy <10 ppm • High Precision Correction improves stability & accuracy Tof(m/z) = c0 +c1 (m/z)1/2 +c2 (m/z) + fixed high precision correction Statistical Calibration for Proteomics Peaklist Statistical Reference Masses Assign Masses (dM < dErr) • Initial Error dErr<500 ppm Calibrate • Using modified Mann’s clustering dErr := Max(50, 0.5*dErr) Yes dErr>=50 No • Resulting Accuracy <20ppm Stop Details of the Calibration Routine: Internal Multipoint Calibration – an Example 1.Calibration round Exclusion limit 150ppm Matching with contaminants Exclusion limit 800ppm 843.0081 903.9288 1023.2356 1046.1874 1062.1533 1068.1865 1077.9011 1119.1784 1242.4039 1273.4572 1303.4928 1317.4594 1431.6357 1476.6355 1749.5326 1805.0227 1821.0056 1827.9984 1844.0284 1925.1300 1929.1918 1942.1387 2212.5501 2226.5907 2240.6103 2274.5346 2299.6929 2385.5507 2422.7973 2430.9228 2718.8983 591 596 556 592 653 597 600 calibration, reject unmatched masses 842.4952 1045.5582 1061.5150 1067.5447 1077.2538 1302.7164 1475.7600 2211.2533 2225.2859 2239.2975 2273.2024 2298.3462 2384.1549 -18 -6 -45 -9 52 1 7 67 74 72 71 -361 85 calibration, reject inaccurate masses average error: 66.7 ppm error [ppm] 654 661 658 657 225 670 measured mass [Da] 2.Calibration round Exclusion limit 40ppm 842.5338 1045.5679 1061.5225 1067.5513 1077.2590 1302.6896 1475.7086 2211.0974 2225.1280 2239.1376 2273.0376 2383.9745 28 4 -38 -3 57 -20 -28 -3 4 1 -2 9 Final calibration calibration, reject inaccurate masses 842.5469 1045.5792 1061.5336 1067.5623 1302.6984 1475.7158 2211.0978 2225.1283 2239.1377 2273.0374 2383.9732 44 14 -27 8 -13 -23 -3 4 1 -2 9 average error: 13.4ppm average error: 16.3ppm Iterative Generation of internal calibrant list Start of PMF identification with a default calibrant list Calibration PMFSearch Generation of an improved calibrant list usually 2 repeats are sufficient The default calibrant list usually consists of three typical trypsin peptides Improved calibrant lists typically contain of 60-100 masses – averagely 10-20 of these can be found in a spectra Problems to be solved by Bioinformatics Search Engines MS based Identity Search MS Protein Identification is Probability based How closely is a given protein or peptide sequence matching to the measured masses ? There are several strategies for a matching “ score“ : For example: -Probability based MOWSE score (Mascot) -Bayesian probability (ProFound) -Cross correlation (MS-Fit) Masses determined by MS are not unique Identification is probability based Problem of assigning true probabilities to a given identification Evaluation of PMF and Search Engines Part 1 Comparison of the performance of the search engines using a typical set of search parameters. Part 2 Successively changing various search parameters to test their influence. Optimisation of search parameters. Dataset: 168 MALDI PMF spectra the data was acquired in the environment of a typical proteome project About 10,000 searches have been performed to establish a statistical basis % of searches 20 18 16 14 12 10 8 6 4 2 0 ProFound % of searches 0.0 0.5 20 18 16 14 12 10 8 6 4 2 0 5% significance level 1.0 1.5 2.0 2.5 ProFound Z score % of searches Comparision of PMF Search Engines – Score Distribution 20 18 16 14 12 10 8 6 4 2 0 5% significance level Mascot 0 50 100 150 200 250 300 Mascot score Mascot MS-Fit Correct identifications 89 (53%) Correct identifications above 63 (37,5%) the 5% significance level 0 1 2 3 4 5 6 log (MS-Fit MOWSE Score) MS-Fit ProFound 55 (32,7%) 90 (53,6%) - Correct identifications above 54 (32,1%) 9 (5,6%) the highest score that has been obtained from an incorrect identification 49 (29,%) 69 (41,1%) Converting the Scoring Distribution to a MetaScore 20 5% significance level 18 16 range of uncertainty 14 correct identifications 12 10 random 8 matches 6 4 2 0 0.0 0.5 1.0 1.5 2.0 2.5 ProFound Z score Idea: Integration of search results from different engines could improve significance and confidence! 100 90 80 MetaScore % of searches ProFound - scoring distribution 70 60 50 40 30 20 10 0 0 1 2 ProFound Z Score An effective ranking of results can be assessed by individual search score distributions 3 Ranking of Search Results of different PMF algorithms by MetaScore - Effective sorting of reported results of several search engines - More correct Proteins are on rank number one - Elimination of false positives - drawback: MetaScore does not reflect true probabilities Problems to be solved by Bioinformatics Search Engines Automated validation of Search Results From Automation to High Throughput List of precursor masses Result judgement PMF Result visualization • Fuzzy Engine • MetaScoring MTP-Viewer m/z No Yes MS/MS Identified ? m/z m/z • Auto MS/MS definition • Search result driven • Queries Fuzzy Engine for Protein Identifikation from PMF spectra Identified Identified (multiple) Probability Score Undefined Uncertain (unique) Uncertain (multiple) Probability Score Score Ratio to unrelated Sequence Sequence Coverage Correlation Coefficient Peak Quality Factor FL Bad data Problems to be solved by Bioinformatics Automation & High Throughput Automated MS/MS Precursor Ion Selection Strategies for automated MS/MS acquisition Acknowledgement Bruker Daltonik Jens Decker , Michael Kuhn Martin Blüggel , Daniel Chamrad Peter Maaß Kristian Bredies