BIFLEX III

advertisement
Mass Spectrometry
in Life Science:
Technology and
Data-Evaluation
H. Thiele
Bruker Daltonik, Germany
Bridging Proteomics & Genomics
Functional Genomics
Proteomics
Genomics
MALDI-TOF Mass Spectrometry
Proteome Analysis
SNP Genotyping
Investigation of
protein diversity
Search
for genetic variations
Identification
No a priori knowledge
about analyte
MALDI-TOF MS
Screening
Analyte of known MW
The Technology
Mass Spectrometer
for
Biopolymer Research
Principle of MALDI-TOF-MS
Vacuum
lock
• all ions with Ekin = 1/2mv2
Vacuum system
Linear flight tube
Drift region
Sample Analyte Acceleration
plate molecules
grids
in matrix
20 to 200 spectra have to be added;
total duration 2 to 20 seconds
with 50 (200) Hertz Laser
Ion detector
Mass spectrum
space/energy
uncertainty
Flight time
m/z
High resolution TOF-MS with Reflector
0V
MALDI ion
source
Ion
detector
The reflector focuses ions of same
mass but different Ekin (velocity)
on detector;
high resolution is obtained
+ kV
Ion reflector
HiRes mass spectrum
Flight time
m/z
MS/MS by PSD
MS/MS = fragment ion or
tandem mass spectromentry
PSD = Post Source Decay
PSD by Reflectron TOF (Scheme)
Electr. potential
ion energy
Metastable decay of molecular
ions,
energy is reduced according to
mass ratio
Adjustment of
voltages
Segment 1
Segment 2
Segment 3
Segment 4
E = 1/2 mv2
v=const.
eg. if M+ = 1000, m = 500 has 4 keV
m = 100 has 0.8 keV
m =25 has 100 eV
Source
Reflector
TOF-MS/MS by PSD
Manual operation: 20 – 40 minutes;
automatic operation: 5 – 10 minutes
Adjustment of voltages
per daughter ion spectrum
Weaker
Weak
Weaker
field
field
field
field
(100 acquisitions in each segment) Strong
MALDI ion
source
Parent ion
selector
Ion reflector
Ion
detector
The daughter ion spectrum can
only be measured in segments
which have to be pasted together.
10 - 15 segments are necessary.
Daughter ion mass spectrum
4
3
2
1
In proteomics,
many proteins have to be separated
and analysed fast to avoid degradation
Regarding structure information,
MALDI MS/MS appears to be optimal,
but PSD is much too slow !
Consequence: Development of a
fast MALDI MS/MS instrument !
MALDI TOF/TOF
with post-acceleration
by potential LIFT
TOF/TOF with LIFT (Scheme)
All fragment ions can be
analyzed simultaneously,
Electr. potential no segmenting necessary
ion energy
1. TOF
2. TOF
Potential is
switched when
ions are in LIFT
Decaying ions,
energy reduced,
low speed
Source
Even low mass ions
have high energy,
good for detection
LIFT
Reflector
TOF -MS/MS with post-acceleration by LIFT
LID
Potential
LIFT for post
acceleration
MALDI ion
source
Parent ion
supressor
Parent ion
selector
Collision
Cell (CID)
Ion
detector
MS/MS spectrum of daughter ions
1
to 200 spectra needed;
is measured in a single acquisition;
1 to
10 seconds
only
no pasting
of segments;
low
sample
consumption,
with
20 Hertz
laser
high speed, high sensitivity
Ion reflector
Daughter ion mass spectrum
Data Evaluation
Goal :
Identification of Proteins (sequence of amino acids)
and Protein modifications
Method :
– Fragmentation of proteins / peptides
resulting in PMF / PFF spectra
– Detection (annotation) of the masses of the fragments
– Identification by database searches
Problems to be solved by Bioinformatics
- Detection of peaks with low signal/noise ratio
- Identification (mass, area, intensity) of (overlapping) isotopic patterns
- Score the results
- Detection of multiple charges (TOF spectra z = 1,2)
nominal mass
Detection of
protonated molecular ion
[M+H]+
average mass
monoisotopic mass
Isotopic
resolution
Isotopic pattern of peptides
12C
1
14N
16O
32S+ :
93 H146
24
24
monoisotopic
12C
1
14N
15N 16O
32S+ :
93 H146
23
24
12C
93
1H
14N
16O
33S +
146
24
24
12C
13C 1H
14N
16O
32S+
92
146
24
24
8.1%, m=2094.0455
: 0.7%, m=2094.0478
: 88.9%, m=2094.0517
12C
1
14N
16O
17O 32S+ :
93 H146
24
23
1
0.9%, m=2094.0526
12C
1
2 14N
16O
32S+ :
93 H145 H
24
24
1.4%, m=2094.0547
Deisotoping: Assigning monoisotopic masses
SNAP approach:
• Peak selection
-
Damping of chemical noise using FFT filtering
Baseline correction
noise calculation
peak search
• Iterative search for isotopic patterns
– Analysing the largest peaks first
– Alignment of patterns using peak list heuristic and FFT deconvolution
– Nonlinear fit using asymmetric line shape
– Subtraction of analysed patterns
• Reevaluation
– Fit of intensities of overlapping patterns, optional addition of ICAT
masses
– Calculation of Quality Factor
SNAP : Regularized FFT Deconvolution
Uncertainty of
mean peptide
isotopic distribution
SNAP : Nonlinear Fit
Local optima for least square fit:
- 2
Exponentially modified gaussians for asymmetric line shapes:
SNAP : Quality Factor
Idea: Get a value for the quality of a pattern which can be used
in favor of S/N or intensity for selecting the “best” peaks
2
Area/Width
Basic Scoring
Fuzzy Scoring
Quality factor
Mean deviation
,  for all patterns
Kind of Spectrum/
Instrument
SNAP : Use Case
To monoisotopic masses
From overlapping peak groups
Wavelet Methods for Denoising Proteomics Spectra
Denoising by Hard Thresholding
Wavelet
Transform
Hard
Thresholding
Inverse
Wavelet
Transform
Scale - adaptive Thresholds
Preservation of Position, Shape and Amplitude of major Peaks
Denoising by Hard Thresholding
Further Developments
"
Baseline Correction
"
Deconvolution of Isotopic Patterns
"
Scale-Energy Parameters for enhanced Clustering
Charge Deconvolution : Without Isotopic Resolution
Charge states for ESI
Different m/z peaks of Equine Apomyoglobin Protein
Protein
Z = 15-70
Peptide
Z = 1,2,3,4
Small molecules Z = 1
MW is calculated from m/z differences between adjacant
peaks by deconvolution software (result see inlet).
M16+
Related
Ion
Deconvolution
Peak Picking
m/z ; intensity
Deconvolution
envelope;
distances
Result
Z + MW
2.5
[M+zH]z+/z
16950.584
M15+
M17+
998.1
2.0
M
1130.7
1.5
M14+
1211.5
1.0
M18+
943.0
16930
M19+
0.5
1304.7
M20+ 893.7
849.1
16950
16970
M12+
1413.6
m/z
800
900
1000
1100
1200
1300
1400
Charge Deconvolution: Isotopic Resolution
For isotopically resolved patterns the charge state and the
mass can be determined from a single pattern.
(M+5H)5+
d (m/z) =0.2 u
(M+4H)4+
d (m/z) =0.25 u
1148
1434
Problems to be solved by Bioinformatics
Get more
accurate data
Calibration
Automatic „Smart“ Calibration
Mass distribution
of peptides
Contaminants,
self digestion
External calibration
spots
Statistical References
Internal Calibrants
External Calibration
• Automatic Control based on
Automatic “Smart”
Calibration
external and internal data
• Resulting Accuracy <10 ppm
• High Precision Correction
improves stability & accuracy
Tof(m/z) = c0 +c1 (m/z)1/2 +c2 (m/z)
+ fixed high precision correction
Statistical Calibration for Proteomics
Peaklist
Statistical Reference Masses
Assign Masses (dM < dErr)
• Initial Error dErr<500 ppm
Calibrate
• Using modified Mann’s clustering
dErr := Max(50, 0.5*dErr)
Yes
dErr>=50
No
• Resulting Accuracy <20ppm
Stop
Details of the Calibration Routine:
Internal Multipoint Calibration – an Example
1.Calibration round
Exclusion limit 150ppm
Matching with contaminants
Exclusion limit 800ppm
843.0081
903.9288
1023.2356
1046.1874
1062.1533
1068.1865
1077.9011
1119.1784
1242.4039
1273.4572
1303.4928
1317.4594
1431.6357
1476.6355
1749.5326
1805.0227
1821.0056
1827.9984
1844.0284
1925.1300
1929.1918
1942.1387
2212.5501
2226.5907
2240.6103
2274.5346
2299.6929
2385.5507
2422.7973
2430.9228
2718.8983
591
596
556
592
653
597
600
calibration,
reject unmatched
masses
842.4952
1045.5582
1061.5150
1067.5447
1077.2538
1302.7164
1475.7600
2211.2533
2225.2859
2239.2975
2273.2024
2298.3462
2384.1549
-18
-6
-45
-9
52
1
7
67
74
72
71
-361
85
calibration,
reject inaccurate
masses
average error: 66.7 ppm
error [ppm]
654
661
658
657
225
670
measured mass [Da]
2.Calibration round
Exclusion limit 40ppm
842.5338
1045.5679
1061.5225
1067.5513
1077.2590
1302.6896
1475.7086
2211.0974
2225.1280
2239.1376
2273.0376
2383.9745
28
4
-38
-3
57
-20
-28
-3
4
1
-2
9
Final calibration
calibration,
reject inaccurate
masses
842.5469
1045.5792
1061.5336
1067.5623
1302.6984
1475.7158
2211.0978
2225.1283
2239.1377
2273.0374
2383.9732
44
14
-27
8
-13
-23
-3
4
1
-2
9
average error: 13.4ppm
average error: 16.3ppm
Iterative Generation of internal calibrant list
Start of PMF identification with a default calibrant list
Calibration
PMFSearch
Generation
of an improved
calibrant list
usually 2 repeats are
sufficient
The default calibrant list usually consists of three typical
trypsin peptides
Improved calibrant lists typically contain of 60-100 masses –
averagely 10-20 of these can be found in a spectra
Problems to be solved by Bioinformatics
Search Engines
MS based
Identity Search
MS Protein Identification is Probability based
How closely is a given protein or peptide sequence matching
to the measured masses ?
There are several strategies for a matching “ score“ :
For example:
-Probability based MOWSE score (Mascot)
-Bayesian probability (ProFound)
-Cross correlation (MS-Fit)
Masses determined by MS are not unique
Identification is probability based
Problem of assigning true probabilities
to a given identification
Evaluation of PMF and Search Engines
Part 1
Comparison of the performance of the search
engines using a typical set of search parameters.
Part 2
Successively changing various search parameters
to test their influence. Optimisation of search
parameters.
Dataset:
168 MALDI PMF spectra
the data was acquired in the environment of a typical
proteome project
About 10,000 searches have been performed to establish
a statistical basis
% of searches
20
18
16
14
12
10
8
6
4
2
0
ProFound
% of searches
0.0
0.5
20
18
16
14
12
10
8
6
4
2
0
5% significance level
1.0 1.5 2.0 2.5
ProFound Z score
% of searches
Comparision of PMF Search Engines – Score Distribution
20
18
16
14
12
10
8
6
4
2
0
5% significance level
Mascot
0
50
100 150 200 250 300
Mascot score
Mascot
MS-Fit
Correct identifications
89 (53%)
Correct identifications above 63 (37,5%)
the 5% significance level
0
1
2
3
4
5
6
log (MS-Fit MOWSE Score)
MS-Fit
ProFound
55 (32,7%) 90 (53,6%)
-
Correct identifications above 54 (32,1%) 9 (5,6%)
the highest score that has
been obtained from an incorrect identification
49 (29,%)
69 (41,1%)
Converting the Scoring Distribution to a MetaScore
20
5% significance level
18
16 range of uncertainty
14
correct
identifications
12
10
random
8
matches
6
4
2
0
0.0
0.5
1.0
1.5
2.0
2.5
ProFound Z score
Idea:
Integration of search results from
different engines could improve
significance and confidence!
100
90
80
MetaScore
% of searches
ProFound - scoring distribution
70
60
50
40
30
20
10
0
0
1
2
ProFound Z Score
An effective ranking of results
can be assessed by
individual search score
distributions
3
Ranking of Search Results of
different PMF algorithms by MetaScore
- Effective
sorting of reported results of several search engines
- More correct Proteins are on rank number one
- Elimination of false positives
- drawback: MetaScore does not reflect true probabilities
Problems to be solved by Bioinformatics
Search Engines
Automated
validation
of Search Results
From Automation to High Throughput
List of precursor
masses
Result
judgement
PMF
Result
visualization
• Fuzzy Engine
• MetaScoring
MTP-Viewer
m/z
No
Yes
MS/MS
Identified ?
m/z
m/z
• Auto MS/MS definition
• Search result driven
• Queries
Fuzzy Engine for Protein Identifikation from PMF spectra
Identified
Identified (multiple)
Probability Score
Undefined
Uncertain (unique)
Uncertain (multiple)
Probability Score
Score Ratio
to unrelated Sequence
Sequence Coverage
Correlation Coefficient
Peak Quality Factor
FL
Bad data
Problems to be solved by Bioinformatics
Automation &
High Throughput
Automated
MS/MS Precursor Ion
Selection
Strategies for automated MS/MS acquisition
Acknowledgement
Bruker Daltonik
Jens Decker , Michael Kuhn
Martin Blüggel , Daniel Chamrad
Peter Maaß
Kristian Bredies
Download