PPT - Bioinformatics Research Group at SRI International

Data Acquisition and
Analysis in Mass
Spectrometry Based
Metabolomics
Pavel Aronov
BioCyc workshop
October 27, 2010
Outline



Fundamentals of Mass Spectrometry
Data Acquisition and Analysis in GCMS based Metabolomics
Data Acquisition and Analysis in LCMS based Metabolomics
How to analyze tryptophan or any
other metabolite?
Two most common techniques in analytical
chemistry to determine or confirm
chemical structure:

Nuclear Magnetic Resonance/NMR (1940s,
Felix Bloch at Stanford University)
Excellent structural information

Mass Spectrometry (1900s, JJ Thompson
at Cambridge University)
Excellent sensitivity
What is a mass spectrometer?
Atmosphere
Vacuum
Mass Spectrometer
M+
M
M
M
Ion Source
M+
Mass Analyzer
M+
Detector
M+
Measured value: mass-to-charge ratio
M/Z
Mass Units

Unit of mass:
1/12 mass of carbon-12 atom
1 u or 1 Da

Unit of mass-to-charge
1 Da / z = 1 Th (Thompson)
m/z 205
For metabolites usually z = 1,
Hence 1 Da is equivalent to 1 Th
Monoisotopic vs Average Mass
Fall 0022 DS
Tryptophan statistically can contain:
no carbon-12 (M): 204.09 Da (100 %)
one carbon-13 (M+1): 205.09 Da (11.9 %)
two carbons-13 (M+2): 206.09 Da (1.4 %)
These are monoisotopic masses
O
OH
HN
NH2
C 11 H 1 2 N 2 O 2
%
Two stable isotopes important in biochemistry
Carbon-12 (100 %) and Carbon-13 (~1.1 %)
Sulfur-32 (100 %) and Sulfur-34 (4.4 %)
vitD051608sample001 (0.005) Is (1.00,1.00) C11H12N2O2
204.09
8.73e12
100
Average mass = (204.09 *100 + 205.09*11.9 +
206.09*1.4)/113.2 = 204.22 (molecular
weight, g/mol)
205.09
206.10
0
203
mass
204
205
206
Mass defect
1H
(p+e-)
12C
14N
16O
n
1.0078 u
12.0000 u
14.0031 u
15.9949 u
1.0087 u
Carbon-12: 6 protons, 6 neutron and 6 electrons
6 x 1.0078 u + 6 x 1.0087 u = 12.0990 u
Mass Defect = 12.0990 u – 12.0000 u = 0.0990 u
E = mc2
0.1 u = 93 MeV
Elemental composition from
accurate mass
1H
12C
14N
16O
1.0078 u
12.0000 u
14.0031 u
15.9949 u
What is 28 u?
N2 (2 x 14 u), CO (12 u + 16 u) or C2H4 (2 x 12 u + 4 x 1 u)?
What is 28.0313 u? [high accuracy]
C2H4 (2 x 12.0000 u + 4 x 1.0078 u)
High resolution mass
spectrometry
562.19
100
561.18
%
0.06 amu
FWHM
High Resolution: R = 561/0.06 ~
9,000
563.20
TOF: 7,000-50,000
Orbitrap: 104-105
FT ICR: 105-106
564.20
0
561.14
100
0.8 amu
FWHM
%
562.10
Nominal Mass Resolution (<1000)
R = 561/0.8 ~ 700
Quadrupoles and ion traps, some TOFs
563.06
0
m/z
560
561
562
563
564
9
Mass of an electron becomes
important at high accuracies
Two types of ions in mass spectrometry:
Odd Electron (OE) Ions
Typically generated by electron ionization (GC/MS):
C 11 H 12 N 2 O 2
204.08988 Da
(2.6 ppm error)
e
C 11 H 12 N 2 O 2
204.08933 Da
(true mass)
0.00055 Da
Even Electron (EE) Ions
Typically generated by chemical ionization techniques and electrospray
C 11H 12N 2O 2
C 1 1 H 1 3 N 2 O 2 205. 09715 Da
(true mass)
C 1 1 H 1 3 N 2 O 2 205. 09770 Da
(2.6 ppm error)
Modern instruments can achieve < 1 ppm accuracy
Identification based on accurate
mass
NL:
6.95E6
MeyerT_100422_sampl
e0062#636 RT: 6.29
AV: 1 T: FTMS {1,1} - p
ESI Full ms
[70.00-800.00]
212.00217
C 8 H6 O4 N S
-0.63646 ppm
100
Relative Abundance
Matching accurate mass
and isotopic peak ratio
90
80
70
60
50
Acquired spectrum
40
30
H
N
20
10
0
NL:
8.59E5
C 8 H 6 O 4 N S:
C 8 H6 O4 N1 S 1
pa Chrg -1
212.00230
C 8 H6 O4 N S
0.00000 ppm
100
90
80
OSO3
70
60
50
Theoretical spectrum
40
30
20
10
0
211
212
213
m/z
214
215
Error = -0.00013 Da/212.0023 Da * 1000,000 = 0.6 parts per million (ppm)
Confirmation of structure from
isotopes (M+2)
213.99796
100
NL:
3.24E5
MeyerT_100422_sampl
e0062#636 RT: 6.29
AV: 1 T: FTMS {1,1} - p
ESI Full ms
[70.00-800.00]
90
Matching accurate mass
and isotopic peak ratio
Relative Abundance
80
70
60
50
Acquired spectrum
40
30
H
N
20
10
0
100
NL:
3.88E4
C 8 H 6 O 4 N S:
C 8 H6 O4 N1 S 1
pa Chrg -1
213.99810
90
80
OSO3
70
60
50
Theoretical spectrum
40
30
20
10
0
213.94
213.96
213.98
214.00
214.02
m/z
214.04
214.06
214.08
Tandem Mass Spectrometry
Mass Spectrometer
M
M
M
Ion
Source
HPLC
M+
Mass
1
F+
M+
Collision
Cell
Analyzer
M+
Atmosphere
Vacuum
M+
F+
Mass
Analyzer
2
F+
Detector
F+
MS/MS of isomers
Prostaglandin A1
336.2301 amu
Prostaglandin B1
336.2301 amu
Chromatography
Separation by volatility and polarity (gas chromatography/GC)
or polarity (liquid chromatography/LC)
C12
)
100
C10
C8
C9
14.40
C14
16.73
11.82
C16
Gas chromatography of
hydrocarbons
18.84
10.43
%
9.00
C18
20.77
C30
C20
C22
30.05
28.44
22.53
27.07
24.16
25.66
0
6.00
8.00
10.00
12.00
14.00
16.00
18.00
20.00
22.00
24.00
26.00
28.00
30.00
32.00
34.00
36.00
Time
2D dimensionality of metabolomics
data
in LC-MS and GC-MS
GC-MS and LC-MS
GC
MS
LC
-Derivatization usually required
(except VOC)
-Upper mass limit at ~400-500 amu
-Preferred for small polar metabolites
(primary metabolism)
-Relatively high peak capacity
-No derivatization usually required
-Upper mass is limited by column
permeability
-Preferred for bigger molecules (e.g.
some lipids, secondary metabolites)
-Relatively low peak capacity
-EI ion source (extensive
fragmentation, reproducible, libraries
available
-ESI ion source (ionic compounds, ion
suppression)
-CI ion source (little fragmentation,
advantage for accurate mass
measurement
-APCI ion source (less ion suppression
and more amenable for non polar
compounds than ESI but usually lower
sensitivity)
Types of Experiments in
Metabolomics
targeted
non-targeted
• Number of analyzed metabolites is
limited by the number of available
standards
•Number of analyzed metabolites is
limited by capacity of analytical
instrumentation
• Absolute quantitation of
metabolites (nM, mg/mL)
• Relative quantitation of metabolites
(fold)
• Selective MS detectors
(quadrupoles, triple quadrupoles)
•Scanning MS detectors (ion trap, TOF, FT)
Bottlenecks in Metabolomics
ASMS09 survey: metabolomics bottlenecks
9-Other; 2%
8-Data acquisition/throughput; 3%
7-Validation/Utility Studies; 5%
6-Statistical analysis; 5%
5-No opinion; 6%
1-Identification of
metabolites; 35%
4-Sample
preparation; 8%
3-Data processing/reduction; 14%
2-Assigning biological significance; 22%
throughput (3 %) vs. post-acquisition bottlenecks (5 + 35 + 22 + 14 = 76 %)
GC-MS based metabolomics:
overview



50 - 600 (400) amu mass range
mono- and disaccharides, amino acids, fatty acids (mostly
primary metabolites)
Derivatization usually required
GC-MS: derivatization
40 mg/mL in pyridine at 37˚C for 90 min


Prevents α-ketoacids from thermal decarboxylation
Keeps sugars in open conformation to minimize number of
conformation and relieve steric hindrences for next step
OH
O
HO
OH
OH
HO
OH
HO
OH
O
OH
N
OCH3
O
HO
OH
α/β
epimers
HO
OH
H 3C
NH2
HO
OH
Syn/anti
isomers
GC-MS: derivatization
MSTFA, 1% TMCS at 50˚C for 30 min


Substitution of active hydrogens
Incomplete derivatization possible
Si
HO
M STFA
Si
O
M S T FA
O
O
O
NH2
O
HN
Si
N
Si
Si
GC-MS data analysis
S H -1 5 H
G -6 0 3 -S H -1 5 -H
1 4 .2 4
100
1 4 .4 8
2 1 .2 5
2 2 .8 3
3 4 .5 5
S c a n E I+
T IC
1 .3 4 e 6
3 6 .5 8
2 3 .3 4
3 5 .5 0
2 3 .4 2
2 5 .4 2
2 7 .1 5
2 2 .5 5
%
1 5 .0 0
2 6 .6 6
4 9 .6 5
2 6 .0 9
1 3 .0 7
9 .4 0
4 0 .5 7
2 8 .8 3
3 7 .6 2
3 0 .6 6
3 2 .9 5
1 5 .0 6
2 3 .9 8
1 3 .6 9
9 .8 7
3 0 .8 1
2 7 .3 1
1 6 .2 3
1 0 .0 3
2 8 .2 2
2 1 .7 8
1 2 .4 4
1 6 .1 0
1 8 .0 5
1 6 .9 0
3 9 .9 2
3 7 .3 8
3 0 .0 7
3 4 .3 9
3 1 .3 2
3 3 .4 7
3 2 .0 4
4 1 .9 2 4 2 .5 8
3 8 .4 9
1 8 .3 7 2 0 .9 9
1 9 .1 12 0 .0 1
4 2 .8 0
0
T im e
1 0 .0 0
1 2 .5 0
1 5 .0 0
1 7 .5 0
2 0 .0 0
2 2 .5 0
2 5 .0 0
2 7 .5 0
3 0 .0 0
3 2 .5 0
3 5 .0 0
3 7 .5 0
4 0 .0 0
4 2 .5 0
4 5 .0 0
4 7 .5 0
5 0 .0 0
Electron Ionization in GC-MS
70 eV >> energy of chemical bond



Highly reproducible
Extensive fragmentation
Often no molecular ion observed
EI: alpha-cleavage [a ] more common
CID MS/MS: inductive cleavage [i ] common
OH2
i
OH
a
OH
GC-MS: present and future
Current GC-MS metabolomics platforms use:
1) nominal resolution mass analyzers
(no accurate mass and elemental composition)
2) electron ionization ion source
OE molecular ions, extensive fragmentation,
often molecular ion is not observed
Advantages:
1) Low cost
2) Good chromatographic separation for many small polar
metabolites after derivatization
3) Extensive libraries of fragmentation spectra help identification
4) Retention time is to some extent predictable (retention indices)
Trends:
1) Development of high resolution instruments for GC/MS
2) Development of soft ionization sources similar to LC/MS
(EE ions, no fragments)
GC-MS data analysis



Deconvolution of mass spectra based on
chromatographic profiles (e.g freeware
AMDIS)
Identification of metabolites based on
matching to spectral libraries and
retention indices
Automated processing routines exist for
some GC-MS instrument (SetupX and
BinBase)
Application Examples
- listeria 1st inj
ARONOVP_100819_SAMPLE004_STER
Scan EI+
TIC
2.31e7
100
- cells
Glycine-2TMS
%
6.16
0
6.00
6.50
7.00
ARONOVP_100819_SAMPLE005_LIST
7.50
8.00
8.50
9.00
9.50
10.00
10.50
11.00
11.50
12.00
12.50
Scan EI+
TIC
2.31e7
11.00
11.50
12.00
12.50
10.67
100
6.16
%
+ cells
0
6.00
Time
6.50
7.00
7.50
8.00
8.50
9.00
9.50
10.00
10.50
Application Examples: AMDIS
Peak of interest
Acquired mass spectrum
Library mass spectrum
(glycine-2TMS)
LC-MS based metabolomics




Combination of ionization modes is preferred
(ESI, APCI, +, -)
Reversed phase LC for non-polar metabolites and
hydrophilic interaction chromatography (HILIC)
for polar metabolites
Detection of spectral “features” (ions) using
metabolomics software
Identification based on accurate mass, and
fragmentation (MS/MS libraries)
Electrospray Ionization (ESI)
R + H+
R–
H+
Positive ESI
[R+H]+
Negative ESI
[R – H]+
APCI
ESI
Soft ionization, pseudomolecular ions [M + H]+, [M - H]- ,[M + Na]+,
[M + Cl]Volatile mobile phase, no inorganic salts (phosphate buffer)
Ionization in gas phase
Ionization in liquid phase
High ionization efficiency for
compounds with high proton
affinity in gas phase
High ionization efficiency for
compound ionic in a solution
Usually singly charged ions
Multiply charged ions common for
large biomolecules (proteins,
nucleic acids)
Compatible with reverse and
normal phase,
Reverse phase,
Mobile phase must be conductive
Ion suppression common
Combination of Acquisition Modes
Separation modes: Reversed phase and HILIC
Ionization modes: ESI and APCI or combined ESI/APCI (MM)
Ionization polarities: + and -
Nordstrom A. et al, Anal Chem, 2008.
RP and HILIC liquid chromatography
RT: 0.00 - 10.00
1.12
114.07
100
95
2.31
166.09
90
NL:
8.02E7
TIC MS
MeyerT_10
0127_samp
le034
Creatinine
1.69
132.10
85
N
80
O
75
NH2
70
Relative Abundance
65
60
3.29
188.07
55
N
50
Reversed Phase C18
45
40
4.03
268.15
35
30
25
2.45
232.03
20
2.80
102.09
15
3.93
102.09
4.52
102.09
5.21
74.10
5.43
74.10
5
Time (min)
6
0.11
74.10
10
6.69
7.12
74.10 74.10 7.66
74.10
8.96
8.46
74.10 74.10
9.02
74.10
5
0
0
1
2
3
4
7
8
9
10
RT: 0.00 - 10.03
1.63
496.34
100
NL:
1.72E8
TIC MS
MeyerT_10
0127_samp
le067
95
90
Creatinine
85
80
2.57
114.07
2.55
114.07
75
70
Aminopropyl HILIC
Relative Abundance
65
60
55
1.42
758.57
50
2.53
114.07
4.38
269.00
4.14
269.00
45
40
2.93
144.10
35
1.22
288.29
30
25
0.92
332.33
20
4.03
269.00
4.54
269.00
4.77
104.99
3.04
118.09
1.77
496.34
Better retention for polar molecules
5.93
255.23
7.45
233.24
7.60
233.24
7.90
233.24
8.45
233.24
8.87
233.24
9.33
233.24
15
10
0.86
233.25
5
0
0
1
2
3
4
5
Time (min)
6
7
8
9
10
LC-MS: Data Analysis

Alignment of chromatograms (optional)

Detection of ‘features’ in mass chromatograms



Removal of isotopic peaks, adducts, fragments etc to
improve statistics
Statistical analysis
Identification based on accurate mass, MS/MS
spectra and comparison with standards
Example:
Search for bacterial metabolites in humans
comparing two groups: controls and people
who underwent colectomy (no colon bacteria)
Initially software detected 900 features in positive ESI mode
After features with missing chromatographic profile were removed
769 features left (visual inspection)
After isotopes were removed, 554 features left. Only at this point,
these are likely molecular ions of individual metabolites
Adducts
MeyerT_100422_sample0088
C18 pos R5
4/27/2010 7:23:42 PM
RT: 14.99 - 16.15
15.59
100
M+H
50
Relative Abundance
NL:
1.53E6
m/z=
398.95-399.26 MS
MeyerT_100422_sa
mple0088
15.61
15.34
0
100
15.83 15.86
15.97 16.01 16.06
NL:
1.37E6
m/z=
416.11-416.28 MS
MeyerT_100422_sa
mple0088
15.59
M + NH4
50
16.03
15.09 15.12 15.18
0
100
15.90
16.12
NL:
6.43E5
m/z=
421.05-421.26 MS
MeyerT_100422_sa
mple0088
15.61
M + Na
50
15.78 15.84
0
15.0
15.1
15.2
15.3
15.4
15.5
15.6
Time (min)
15.7
15.8
15.97
15.9
MeyerT_100422_sample0088 #1581-1600 RT: 15.51-15.69 AV: 20 NL: 7.32E5
T: FTMS {1,1} + p ESI Full ms [70.00-800.00]
399.1856
100
M+H
90
16.08
16.0
16.1
M + NH4
416.2120
Relative Abundance
80
70
60
M + Na
50
421.1673
40
30
20
400.1888
417.2153
10
0
397.1851
398
401.1905 403.0866
400
402
404
407.1887
406
408
409.9785
410
415.2011
413.2656
412
m/z
414
416
420.3664
418.2171
418
420
422.1704
423.1722
422
424
426
Fragments in LC-MS
MeyerT_100422_sample0088
C18 pos R5
4/27/2010 7:23:42 PM
RT: 6.69 - 8.06
Relative Abundance
NL:
2.23E5
m/z=
118.05-118.09
MS
MeyerT_10042
2_sample0088
7.31
7.33
100
7.29
80
7.35
Hyppuric acid
60
7.39
40
20
0
100
6.73 6.78
6.87 6.91
7.00
7.08
7.51 7.55 7.60 7.66 7.74 7.80 7.86
7.17
7.94 7.99
NL:
2.38E5
m/z=
118.05-118.09
MS
MeyerT_10042
2_sample0088
7.31
7.33
7.29
80
7.35
m/z 118.0651
60
7.39
40
7.41
20
6.73 6.78
6.85
6.92 6.98
7.51 7.55 7.60 7.66
7.06 7.12 7.17
7.76 7.80
7.93
8.00
0
6.7
6.8
6.9
7.0
7.1
7.2
7.3
7.4
Time (min)
7.5
7.6
7.7
7.8
7.9
8.0
MeyerT_100422_sample0088 #740-782 RT: 7.16-7.52 AV: 43 NL: 3.04E6
T: FTMS {1,1} + p ESI Full ms [70.00-800.00]
Hyppuric acid
100
90
180.0651
Relative Abundance
80
70
C8H8N – indole?
No, fragment of hyppuric acid
Not confirmed by GC-MS either
60
50
40
176.9715
30
20
10
118.0651
0
115
181.0684
122.5471
125.9862
120
125
134.0599
130
135
141.9584
140
145
149.0231 154.9899
150
155
m/z
162.0547 167.0125
173.0298
160
165
170
175
182.9847
180
185
195.0873
190
195
Identification tools


Accurate mass search (BioCyc,
HMDB, Metlin)
MS/MS search (Metlin, MassBank)
In addition, many MS manufacturers
offer proprietary tools for structure
elucidation
MassBank MS/MS
sulfate
m/z 132
C8H6NO
LC-MS Data Analysis Summary



Not every peak detected by a mass spectrometer
represents an individual metabolite
Automated data processing helps to reduce the
amount of routine work, however human
intervention is still required
Accurate mass measurements and MS/MS allow
to determine elemental composition of unknowns
and their structural components. Confirmation
with chemical standards is still required