Detailed experimental protocols, supplementary

advertisement
Additional file 1: Detailed experimental protocols, supplementary analysis of
results, and metabolic Identification
Method S-1: Detailed Description of Sample Pretreatment and LC/TOF MS Protocols
Serum samples were thawed, and proteins precipitated by addition of acetonitrile to the serum
sample in a 5:1 ratio (1000 L acetonitrile + 200 L serum). The mixture was vortexed for 1
minute and incubated at room temperature for 40 minutes, then the sample was centrifuged at
13,000 g for 15 minutes and the supernatant retained. The supernatant was vacuum evaporated
and the residue reconstituted in 80% acetonitrile/0.1% TFA.
LC/TOF MS analyses were performed on a JEOL AccuTOF (Tokyo, Japan) mass spectrometer
coupled to an Agilent 1100 Series LC system (Santa Clara, CA) via an ESI source. The TOF
resolving power measured at full width half maximum was 6000 and the observed mass
accuracies ranged from 5-15 ppm, depending on the signal-to-noise ratio of the particular ion
investigated. The LC system was equipped with a solvent degasser, a binary pump, an
autosampler, and a thermostatic column compartment (held at 25ºC). The injection volume was
15 L in all cases. Reverse phase separation of serum samples was performed using a
Symmetry® C18 column (3.5 m, 2.1 x 150 mm, pore size 100Å; Waters, Milford, MA) at a
flow rate of 150 L min-1. The analytical column was preceded by a Zorbax® RX-C18 guard
column (5.0 m, 4.6 x 12.5 mm, pore size 2 m; Agilent). The LC solvent mixtures used were:
A = 0.1% formic acid in water and B = 0.1% formic acid in acetonitrile. After a pre-run wash
and equilibration with 5% B for 15 minutes, data acquisition was started and the solvent
-1-
composition was varied according to the solvent program described in the Table S-1. After
analysis of a given serum specimen, a 0.20 mM sodium trifluoroacetate standard (NaTFA) was
run for mass drift compensation purposes. For NaTFA analysis, 100% B at a flow rate of 300 L
min-1 was used and data was acquired for 10 minutes. After each injection of the sample or drift
correction standard, the column was washed with 100% B for 30 minutes.
Table S-1: LC solvent gradient used in metabolomic experiments.
Time
(min)
0.0
10.0
15.0
0.0
5.0
10.0
20.0
28.0
38.0
50.0
90.0
100.0
110.0
120.0
130.0
160.0
180.0
0.0
30.0
0.0
10.0
0.0
30.0
%B
(acetonitrile/0.1% formic acid)
Pre-Run Column Equilibration
100
5
5
Sample Run
5
5
20
25
30
35
40
45
50
60
75
85
95
100
Post-Run Column Wash
100
100
NaTFA Standard Run
100
100
Post-Run Column Wash
100
100
-2-
Flow Rate
(Lmin-1)
300
150
150
150
150
150
150
150
150
150
150
150
150
150
150
150
150
300
300
300
300
300
300
Spectral data was collected in the 100-1750 m/z range with a spectral recording interval of 1.5 s
and a data sampling interval of 0.5 ns for both positive and negative ion ESI modes. The settings
for the TOF mass spectrometer for positive or negative ion mode were as follows: needle voltage:
+/- 2000 V, ring lens: +8 V or -9 V, orifice 1: +30 V or -69 V, orifice 2: +6 V or -8 V,
desolvation chamber temperature: 250ºC, orifice 1 temperature: 80ºC, nebulizing gas flow rate:
1.0 L min-1, desolvation gas flow rate 2.5 L min-1, and detector voltage +/- 2800 V. The TOF
analyzer pressure was 4.8 E-6 Pa during analysis. The RF ion guide voltage amplitude was swept
to ensure adequate transmission of analytes in a wide range of m/z values. The sweep parameters
were as follows: initial peaks voltage: 700 V, initial time: 20%, sweep time: 50%, final peaks
voltage: 2500 V. After LC/TOF MS data was collected, it was centroided, mass drift corrected
using the NaTFA reference spectrum, and exported in NetCDF format for further mining.
To ensure maximum reproducibility in metabolomic experiments, all serum specimens were run
consecutively within a 2.5 month period. Every cancer sample was randomly paired with a
normal sample and run on the same day to ensure that no temporal bias was introduced in the
way samples were analyzed. Sample pairs were run in random order and in duplicate.
-3-
Analysis S-1: Detailed Description of the Prediction and Feature Selection Performance for the
Pos-Ion-Mode and Neg-Ion-Mode Datasets
The prediction performance of the pos-ion-mode and neg-ion-mode datasets evaluated without
feature selection are summarized in Table S-2. As apparent in the table, the neg-ion-mode dataset
had a better prediction performance than the pos-ion-mode dataset, with the highest prediction
performance (81.9%) obtained using the linear SVM classifier. For the pos-ion-mode dataset, the
nonlinear SVM classifier generally outperformed the linear SVM classifier, while for the negion-mode dataset, the linear SVM generally had a better performance.
Table S-2: Prediction Performance (%) without Feature Selection
Classifier
52-20-split Validation
(50 trials)
pos-ion-mode (n = 360)
70
SVM
71.8
SVM_NL
neg-ion-mode (n = 232)
73.2
SVM
72.4
SVM_NL
12-fold CV
LOOCV
(10 trials)
Accuracy
Sensitivity
Specificity
71.3
75.6
72.2
73.6
64.9
78.4
80
68.6
80.4
79.9
81.9
80.6
81.1
81.1
82.9
80
Table S-3 summarizes the prediction performance of the pos-ion-mode and neg-ion-mode
datasets subsequent to feature selection (Figure 2a, where each feature selection method was
applied to the whole dataset, then the prediction performance of the dataset containing only the
selected feature subset (panel) was measured using the three evaluation processes) and gives the
number of important features identified for each model. The estimated predictive performance
was surprisingly high (greater than 90% for most of the methods) under LOOCV, which is
perhaps the most accurate evaluation technique in this low-sample setting. For the pos-ion-mode
-4-
and neg-ion-mode datasets, the feature selection results of SVMRFE had the best discriminative
power followed by that of SVMRFE_NL method, while SVMRW performed the worst.
Table S-3: Prediction Performance (%) and the Number of Important Features for the SVM and
SVM_NL Classifiers Evaluated After Feature Selection is Applied to the Whole Dataset
Classifier
Feature
Selection
pos-ion-mode (n = 360)
SVM
SVMRFE
SVM
L1SVM
SVM_NL
SVMRFE_NL
SVM_NL
SVMRW
neg-ion-mode (n = 232)
SVM
SVMRFE
SVM
L1SVM
SVM_NL
SVMRFE_NL
SVM_NL
SVMRW
52-20-split Validation
(50 trials)
12-fold CV
(10 trials)
LOOCV
# Important
Features
81.6
72.9
76.2
60.5
87.6
75.1
81.1
61.3
91.7
76.4
83.3
65.3
36
36
22
32
94.0
82.5
88.5
77.4
98.5
91.8
95.7
83.3
100.0
95.8
97.2
88.9
47
46
23
32
Table S-4 shows a comparison of the prediction performance of SVM in combination with
feature selection methods performed under more conservative settings (Figure 2b, where at each
evaluation, the feature selection method was first applied to a training dataset and then the
prediction performance of the selected feature subset on the test dataset was measured) while
Table S-5 shows the average number of important features identified in these models. The best
prediction performance of the pos-ion-mode and neg-ion-mode datasets in this setting is 80.6%,
which is comparable to the prediction performance without feature selection. The feature size is
reduced from 232 to 41 (with SVMRFE on neg-ion-mode dataset using LOOCV). LOOCV
evaluation leads to a higher test accuracy than the other two evaluation procedures demonstrating
the effect of the training set size on the test accuracy. LOOCV evaluation results indicate that
feature selection using SVMRFE achieved the best prediction performance, the L1SVM method
-5-
was the second best feature selection method while SVMRW was the worst. Both 52-20-split
validation and 12-fold CV evaluation results indicate that i) L1SVM performed the best on the
neg-ion-mode datasets, ii) SVMRFE_NL method performed the best on the pos-ion-mode
dataset, and iii) SVMRW method resulted in the worst prediction accuracy. Overall, a clear
winner was not easily identifiable among the tested methods.
Table S-4: Prediction Performance (%) Evaluated After Feature Selection is Applied to Training
Subsampling of Dataset during Each Validation
Classifier
Feature
Selection
52-20-split Validation
12-fold CV
(50 trials)
(10 trials)
Accuracy
Sensitivity
Specificity
64.0
65.5
66.5
60.2
67.5
70.6
71.4
59.7
72.2
70.8
66.7
59.7
64.9
70.3
73.0
62.2
80.0
71.4
60.0
57.1
68.4
71.5
69.1
59.6
74.7
76.2
74.3
63.6
80.6
75.0
73.6
69.4
86.5
83.8
78.4
70.3
74.3
65.7
68.6
68.6
pos-ion-mode (n = 360)
SVM
SVMRFE
SVM
L1SVM
SVM_NL SVMRFE_NL
SVM_NL
SVMRW
neg-ion-mode (n = 232)
SVM
SVMRFE
SVM
L1SVM
SVM_NL SVMRFE_NL
SVM_NL
SVMRW
LOOCV
Table S-5: Statistics on the Average Number of Important Features of the Models Described in
Table S-4
Classifier
Feature
Selection
pos-ion-mode (n = 360)
SVM
SVMRFE
SVM
L1SVM
SVM_NL SVMRFE_NL
SVM_NL
SVMRW
neg-ion-mode (n = 232)
SVM
SVMRFE
SVM
L1SVM
SVM_NL SVMRFE_NL
SVM_NL
SVMRW
52-20-split Validation
12-fold CV
(50 trials)
(10 trials)
25 ± 7
30 ± 2
21 ± 7
20 ± 9
31 ± 8
35 ± 2
30 ± 10
27 ± 11
35 ± 5
36 ± 1
26 ± 7
31 ± 9
27 ± 9
34 ± 2
33 ± 8
32 ± 10
33 ± 8
41 ± 2
37 ± 7
34 ± 7
41 ± 9
44 ± 2
36 ± 9
34 ± 7
-6-
LOOCV
Analysis S-2: Detailed Description of the Effect of Bagging on Prediction Performance
The effects of the bagging strategy (bootstrap sampling was repeated 101 times, i.e. T=101) on
the prediction performance for the multimode, pos-ion-mode, and neg-ion-mode datasets under
LOOCV evaluation are summarized in Table S-6. The results indicate that bagging does not
boost the best prediction performance (80.6%). Although it did improve the classification
accuracy for the data with certain feature selection methods (highlighted in bold), it also reduced
the classification accuracy for other cases (highlighted in italics). Due to these observations and
its high computational cost, the bagging process was not evaluated in further tests.
Table S-6: Averaged LOOCV Prediction Performance with Bagging (%): Feature Selection
Methods Applied to Training Subsampling of Dataset
Performance SVMRFE
multimode (n = 592)
72.2
pos-ion-mode (n = 360)
70.8
neg-ion-mode (n = 232)
80.6
L1SVM
SVMRFE_NL
SVMRW
79.2
80.6
70.8
73.6
65.3
61.1
70.8
76.4
66.7
-7-
Table S-7: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the multimode
dataset.A
Neutral
Mass (Da)
148.0129
Species
Investigated
RT
(min)
Estimated Formulae (in
order of decreasing score)
Accuracy
(ppm)
Score
(%)
Tentative
Identification
Source
Entry
Spectrum
Model
Location (Stable at 80%)
[M+H] +
116.88
C4N6O, C5H9OPS,
0.1-11.7
99.5-90.1
S-1 (a)
SVMRFE_NL
L1SVM
3.7-12.8
96.5-91.5
S-1 (b)
SVMRFE_NL
L1SVM
0.1-12.9
91.7-86.5
S-1 (c)
L1SVM
0.1-12.1
94.0-89.5
S-1 (c)
L1SVM
S-1 (d)
SVMRFE_NL
L1SVM
C4H8N2S2
204.0695
[M+H] +
116.87
C12H13OP, C6H12N4O2S,
C14H8N2, C11H12N2S
206.1306 B
[M+CH3COO] - 176.83
C6H18N6S, C13H18O2,
C9H14N6, C7H19N4OP
220.1463 B
[M+HCOO] -
176.83
C7H10N6S, C10H16N6,
C12H19F3, C11H21O3F,
C14H20O2
256.2398
[M-H] -
104.69
C16H32O2
1.7
96.3
266.1518 B
[M-H] -
176.83
C11H26N2OS2, C12H26O4S,
2.7-15.0
96.9-90.7
S-1 (c)
L1SVM
S-1 (e)
SVMRFE_NL
L1SVM
16 carboxylic acid isomers
(e.g. palmitic acid)
see footnote
(C)
C8H22N6O2S, C12H29P3,
C10H22N2O6
280.2460
[M-H]
282.2154 D
-
98.85
C15H38P2, C15H36O2S
4.1-8.6
95.0-94.4
[M-H] -
139.70
C17H30O3
14.5
99.3
12s-hydroxy-16-heptadecynoic acid
MID 35560
S-1 (f)
SVMRFE_NL
L1SVM
284.2701 E
[M-H] -
123.87
C18H36O2
5.0
96.1
12 carboxylic acid isomers
(e.g. stearic acid)
see footnote
(F)
S-1 (g)
SVMRFE_NL
L1SVM
340.2489
[M-H] -
130.13
C20H38P2, C20H37O2P,
4.3-12.4
98.1-95.4
S-1 (h)
SVMRFE_NL
L1SVM
C22H32N2O, C17H32N4O3,
C16H33N6P
-8-
Table S-7, cont’d: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the
multimode dataset.A
Neutral
Mass (Da)
Species
Investigated
RT
(min)
Estimated Formulae (in
order of decreasing score)
Accuracy
(ppm)
Score
(%)
Tentative
Identification
Source
Entry
Spectrum
Location
Model
(Stable at 80%)
[M-H]-
42.40
C14H22N6O5
6.9
95.4
6 peptide isomers
(e.g. GlnHisAla)
see footnote
(G)
S-1 (i)
SVMRFE_NL
L1SVM
358.2869 H
[M+HCOO]-
154.65
C24H38O2
0.7
93.2
11 bile acid isomers
(e.g. 5b-chol-9(11)-en-24-oic Acid)
see footnote
(I)
S-1 (j)
L1SVM
368.1652 J
[M-H]-
85.48
C19H28O5S
1.4
93.1
2 isomers
(e.g. DHEA Sulfate)
see footnote
(K)
S-1 (k)
SVMRFE_NL
L1SVM
384.2831 L
[M+CH3COO]-
90.74
C26H40S, C23H44S2,
C21H40N2O2S, C29H36,
C18H44N2O2S2
3.4-13.9
94.0-85.9
S-1 (l)
SVMRFE_NL
L1SVM
398.2982 L
[M+HCOO]-
90.74
C27H42S, C24H46S2,
C22H42N2O2S, C25H38N2O2,
C19H46N2O2S2
3.8-1.4
94.0-86.9
S-1 (l)
SVMRFE_NL
L1SVM
433.3256 M
[M+HCOO]-
91.97
C26H43NO4
14.8
98.8
S-1 (m)
SVMRFE_NL
L1SVM
444.3037 L
[M-H]-
90.74
C24H40N6S, C28H45PS,
C28H44O2S, C25H49PS2,
C25H48O2S2
S-1 (l)
SVMRFE_NL
L1SVM
479.3310 M
[M-H]-
91.97
C24H50NO6P
13.7
96.6
8 glycerophosphocholine isomers
(e.g. PC(P-16:0/0:0)
see footnote
(O)
S-1 (m)
SVMRFE_NL
L1SVM
481.2835
[M-H]-
106.07
C22H44NO8P
6.3
90.4
10 glycerophosphocholine isomers
(e.g. PC(10:0/4:0))
see footnote
(P)
S-1 (n)
SVMRFE_NL
L1SVM
481.3047
[M-H]-
116.28
C12H35N17O4, C12H36N17O2P,
C17H50N5O4P3, C16H39N11O6,
C17H51N5O2P4
2.2-14.9
95.1-93.4
S-1 (o)
L1SVM
354.1676
Lithocholic acid glycine conjugate
HMDB
00698 N
0.45-13.0 94.1-91.9
-9-
Table S-7, cont’d: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the
multimode dataset.A
Neutral
Mass (Da)
Species
Investigated
RT
(min)
Estimated Formulae (in
order of decreasing score)
Accuracy
(ppm)
Score
(%)
495.3210
[M+H]+
109.68
C21H46N5O6P, C21H45N5O8,
C18H37N15O2, C19H37N13O3,
C20H47N7O3P2
1.1-13.7
499.9613
[M-H]-
166.34
C21H8O13S, C21H9O11PS,
C20H10N2O8P2S, C19H22P6S2,
C18H4N4O12S
505.2842
[M-H]-
100.09
[M+CH3COO]- 147.77
505.3308 Q
Tentative
Identification
Source
Entry
Spectrum
Location
Model
(Stable at 80%)
99.7-98.8
S-1 (p)
SVMRFE_NL
L1SVM
2.0-14.5
96.6-96.1
S-1 (q)
SVMRFE_NL
L1SVM
C22H47N5P4, C22H46N5O2P3,
C17H31N17O2, C20H36N13OP,
C19H41N9O3P2
0.9-12.1
99.5-97.9
S-1 (r)
SVMRFE_NL
L1SVM
C28H49N3OP2, C29H49NO2P2,
C27H39N9O, C29H48NO4P,
C25H44N7O2P
2.5-13.8
94.0-92.6
S-1 (s)
SVMRFE_NL
L1SVM
S-1 (t)
L1SVM
507.3131
[M-H]-
112.77
C28H45NO7, C28H46NO5P,
C26H46N5OPS, C27H45N3O4S,
C26H37N9O2
0.1-12.8
97.3-96.2
509.3156
[M-H]-
121.27
C24H48NO8P
7.6
91.8
6 glycerophospholipid isomers
(e.g. PE(9:0/10:0))
see footnote
(R)
S-1 (u)
SVMRFE_NL
L1SVM
519.3330
[M+H]+
100.17
C26H50NO7P
1.0
99.0
3 PC(18:2/0:0) isomers
(e.g. LysoPC(18:2(9Z,12Z)))
see footnote
(S)
S-1 (v)
SVMRFE_NL
L1SVM
519.3459 Q
[M+HCOO]-
147.77
C26H46N7O2P, C27H57NP4,
C29H51N3OP2, C26H45N7O4,
C27H45N5O5
1.7-14.2
93.3-92.8
S-1 (s)
SVMRFE_NL
L1SVM
563.3363 Q
[M-H]-
147.77
C26H47N9OP2, C24H37N17,
C28H57NO2P4, C25H37N15O,
C27H46N7O4P
2.7-10.2
94.0-93.0
S-1 (s)
SVMRFE_NL
L1SVM
757.5678
[M+H]+
127.85
C42H80NO8P
7.5
83.3
S-1 (w)
L1SVM
- 10 -
31 glycerophospholipid isomers
(e.g. PE-NMe(18:1(19E)/18:1(9E)))
see footnote
(T)
Table S-7, cont’d: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the
multimode dataset.A
Neutral
Mass (Da)
Species
Investigated
RT
(min)
Estimated Formulae (in
order of decreasing score)
Accuracy
(ppm)
Score
(%)
Tentative
Identification
Source
Entry
Spectrum
Location
Model
(Stable at 80%)
759.5775 U
[M+Na]+
138.38
C42H82NO8P
0.4
42.6
18 glycerophosphocholine isomers
(e.g. PC(14:0/20:1(11Z)))
see footnote
(V)
S-1 (x)
L1SVM
781.5595 U
[M+H]+
138.38
C44H80NO8P
3.4
46.0
32 glycerophosphocholine isomers
(e.g. PC(14:0/22:4(7Z,10Z,13Z,16Z)))
see footnote
(W)
S-1 (x)
L1SVM
787.6000 X
[M+Na]+
136.68
C44H86NO8P
11.6
74.6
22 glycerophosphocholine isomers
(e.g. PC(14:0/22:1(13Z)))
see footnote
(Y)
S-1 (y)
SVMRFE_NL
L1SVM
(A) Matches to identified compounds were made using accurate mass measurements and isotope cluster matching. For species which could not be matched
against metabolite databases, the top matching formulae (according to score) are listed (for features matching fewer than five formulae, all formulae are
shown). For features corresponding to multiple possible isomers, the following nomenclature is given: # isomers found including name of isomer [source
(cross-listed source, if any)].
(B) Adduct analysis yielded multiple possible ion species for this feature. All are listed as none could be matched against the databases.
(C) 16 isomers found including palmitic acid [LMFA 01010001 (HMDB 00220)], isopalmitic acid [LMFA 01020010], 2,6-dimethyl-tetradecanoic acid
[LMFA 01020038], 2,8-dimethyl-tetradecanoic acid [LMFA 01020039], 3-methyl-pentadecanoic acid [LMFA 01020164], 2-propyl-tridecanoic acid
[LMFA 01020165], 2-hexyl-decanoic acid [LMFA 01020166], 3-ethyl-3-methyl-tridecanoic acid [LMFA 01020167], 2-heptyl-nonanoic acid [LMFA
01020168], 6-ethyl-tetradecanoic acid [LMFA 01020169], 2,4-dimethyl-tetradecanoic acid [LMFA 01020170], 3,5-dimethyl-tetradecanoic acid
[LMFA 01020171], 4-hexyl-decanoic acid [LMFA 01020172], 2-ethyl-2-butyl-decanoic acid [LMFA 01020173], 13-methyl-pentadecanoic acid
[LMFA 01020192], 4,8,12-trimethyl-tridecanoic acid [LMFA 01020249].
(D) Adduct analysis yielded multiple possible ion species for this feature. Only 1 species could be tentatively identified.
(E) Adduct analysis yielded multiple possible ion species for this feature. Only 1 species could be tentatively identified
- 11 -
Table S-7, cont’d: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the
multimode dataset.A
(F) 12 isomers found including stearic acid [HMDB 00827 (LMFA 01010018, MID 189, MMCD cq_00998)], 10-methyl-heptadecanoic acid [MID 4292
(LMFA 01020013)], (+)-isostearic acid [MID 4293 (LMFA 01020014)], 2,6-dimethyl-hexadecanoic acid [MID 4324 (LMFA 01020042)], 4,8-dimethylhexadecanoic acid [MID 4325 (LMFA 01020043)], 2,14-dimethyl-hexadecanoic acid [MID 4326 (LMFA 01020044)], 4,14-dimethyl-hexadecanoic acid
[MID 4327 (LMFA 01020045)], 6,14-dimethyl-hexadecanoic acid [MID 4328 (LMFA 01020046)], lambda isostearic acid [MID 4493 (LMFA
01020093)], neostearic acid [MID 4620 (LMFA 01020094)], 11,15-dimethyl-hexadecanoic acid [MID 34604 (LMFA 01020175)], 15-methylheptadecanoic acid [MID 34632 (LMFA 01020205)].
(G) 6 isomers found including Gln His Ala [MID 23091], Gln Ala His [MID 22217], Ala His Gln [MID 21229], Ala Gln His [MID 16023], His Gln Ala
[MID 20595], His Ala Gln [MID 18707].
(H) Adduct analysis yielded multiple possible ion species for this feature. Only 1 species could be tentatively identified.
(I) 11 isomers found including 5b-chol-9(11)-en-24-oic Acid [MID 42731 (LMST 04010142)], 5b-chol-11-en-24-oic Acid [MID 42732 (LMST 04010143)],
5bchol-14-en-24-oic Acid [MID 42733 (LMST 04010144)], 5b-chol-2-en-24-oic Acid [MID 42757 (LMST 04010170)], 5b-chol-3-en-24-oic Acid [MID
42846 (LMST 04010263)], Chol-4-en-24-oic Acid [MID 42847 (LMST 04010264)], Chol-5-en-24-oic Acid [MID 42848 (LMST 04010265)], 5b-chol-6-en24-oic Acid [MID 042849 (LMST 04010266)], 5b-chol-7-en-24-oic Acid [MID 42850 (LMST 04010267)], 5b-chol-8-en-24-oic Acid [MID 42851 (LMST
04010268)], 5bchol-8(14)-en-24-oic Acid [MID 42852 (LMST 04010269)].
(J) Adduct analysis yielded multiple possible ion species for this feature. Only 1 species could be tentatively identified.
(K) 2 isomers found including DHEA sulfate [HMDB 01032 (LMST 05020010)], testosterone sulfate [HMDB 02833].
(L) Adduct analysis yielded multiple possible ion species for this feature. All are listed as none could be matched against the databases.
(M) Adduct analysis yielded multiple possible ion species for this feature. Only species that could be tentatively identified are listed.
(N) Cross-listed as MMCD cq-10750 and MID 5666.
(O) 8 isomers found including PC(P-16:0/0:0) [HMDB 10407 (LMGP 01070006)], PC(O-16:1/0:0) [LMGP 01050100, 01050101, 01050102, 01050103,
01050104, 01070004, 01070005].
(P) 10 isomers found including PC(10:0/4:0) [LMGP 01010403], PC(12:0/2:0) [LMGP 01010443], PC(6:0/8:0) [LMGP 01011233, 01011234], PC(7:0/7:0)
[LMGP 01011238, 01011239, 01011240], PC(8:0/6:0) [LMGP 01011248, 01011249], PC(9:0/5:0) [LMGP 01011269].
(Q) Adduct analysis yielded multiple possible ion species for this feature. All are listed as none could be matched against the databases.
- 12 -
Table S-7, cont’d: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the
multimode dataset.A
(R) 6 isomers found including PE(9:0/10:0)[U] [MID 40490 (LMGP 02010091)], PE(10:0/9:0)[U] [MID 40669 (LMGP 02010272)], PC(14:0/2:0) [LMGP
01010504], PC(8:0/8:0) [LMGP 01011251, 01011252, 01011253].
(S) 3 isomers found including glycoursodeoxycholic acid 3-sulfate [HMDB 02409 (MMCD cq_17361, MID 6670)], glycochendeoxycholic acid 7-sulfate
[HMDB 02496 (MMCD cq_17159, MID 6692)], glycochendeoxycholate-3-sulfate [HMDB 02497 (MMCD cq_17507, MID 6702)].
(T) 31 isomers found including PE-NMe(18:1/18:1) [LMGP 02010331 (MMCD cq_17959), 02010333, 02010338, 02010350], PC(16:0/18:2) [LMGP
01010585, 01010586, 01010587, 01010588, 01010589, 01010590, 01010591, 01010592, 01010593, 01010594, 01010595, 01010596], PC(16:1/18:1)
[LMGP 01010678, 01010680, 01010687, 01010688, 01010689], PC(17:1/17:1) [LMGP 01010726, 01010727, 01010728], PC(18:0/16:2(2E,4E)) [LMGP
01010745], PC(18:1/16:1) [LMGP 01010886, 01010887], PC(18:2/16:0) [LMGP 01010920, 01010926, 01010932, 01010933].
(U) Adduct analysis yielded several possible ion species for the selected feature. Only species having tentative matches are listed.
(V) 18 isomers found including PC(14:0/20:1(11Z)) [HMDB 07879], PC(16:0/18:1) [LMGP 01010005, 01010575, 01010576, 01010577, 01010578, 01010579,
01010580, 01010581, 01010582, 01010583, 01010584], PC(16:1/18:0) [LMGP 01010679, 01010686], PC(18:0/16:1(9Z)) [LMGP 01010744],
PC(18:1/16:0) [LMGP 01010874, 01010884, 01010885].
(W) 32 isomers found including PC(14:0/22:4(7Z,10Z,13Z,16Z)) [HMDB 07889], PC(16:0/20:4) [LMGP 01010007, 01010629, 01010630, 01010631],
PC(18:0/18:4) [LMGP 01010772, 01010773, 01010774, 01010775, 01010776], PC(18:1/18:3) [LMGP 01010897, 01010898, 01010899], PC(18:2/18:2)
[LMGP 01010918, 01010919, 01010921, 01010922, 01010923, 01010924, 01010925, 01010927, 01010928, 01010929, 01010930, 01010937, 01010938,
01010939], PC(18:3/18:1) [LMGP 01010949, 01010955], PC(20:4/16:0) [LMGP 01011049, 01011050, 01011056].
(X) Adduct analysis yielded several possible ion species for the selected feature. Only 1 species could be tentatively identified.
(Y) 22 isomers found including PC(14:0/22:1(13Z)) [HMDB 07887], PC(16:0/20:1(11Z)) [LMGP 01010618], PC(18:0/18:1) [LMGP 01010749, 01010750,
01010751, 01010752, 01010753,01010754, 01010755, 01010756, 01010757, 01010758, 01010759, 01010760, 01010761, 01010762, 01010763],
PC(18:1/18:0) [LMGP 01010840, 01010875, 01010888, 01010889], PC(20:1(11Z)/16:0)[U] [LMGP 01011037].
- 13 -
Figure S-1: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of 1st
isomer listed is included as inset in spectra for those features that were tentatively matched to compounds).+
Unidentified
2.5e+5
[M+H] =205.0773
205.0773
2.0e+5
a
Unidentified
b
M=148.0129
2.0e+5
279.1492
205.0735
149.0128
Intensity (A.U.)
Intensity (A.U.)
1.5e+5
1.5e+5
+
[M+H] =149.0207
Col 1 vs Col
2
149.0207
1.0e+5
[M+CH3CN+H]+=190.0473
5.0e+4
M=204.0695
1.0e+5
5.0e+4
190.0425
190.0450
[M+H2O+H]+=223.0879
223.0907
0.0
100
120
140
160
180
200
220
240
100
120
140
160
180
m/z
c
d
265.1445
10000
8000
10000
260
280
300
Palmitic acid
HMDB 00220
[M-H]-=255.2320
M=256.2398
Unidentified
[M-H]-=265.1445
M = 266.1518
Intensity (A.U.)
Intensity (A.U.)
12000
240
8000
16000
14000
220
m/z
20000
18000
200
[M+HCOO]-=265.1445
M = 220.1463
[M+HCOO]-=265.1445
M = 206.1306
6000
4000
6000
[M+HCOO]-=301.2375
301.2930
2000
4000
134.8709
2000
339.1862
0
0
100
200
300
50
400
m/z
100
150
200
250
m/z
- 14 -
300
350
400
Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of
1st isomer listed is included as inset in spectra for those features that were tentatively matched to compounds).
[M-H]-=279.2382
1.5e+5
e
12-hydroxy-8E,10E-heptadecadienoic Acid
MID 35560
f
30000
Intensity (A.U.)
Intensity (A.U.)
[M-H]-=281.2076
Unidentified
M=280.2460
20000
1.0e+5
M=282.2154
5.0e+4
10000
[M+HCOO]-=325.2437
325.2380
303.2285
200
250
300
350
327.2437
400
250
300
m/z
50000
350
400
m/z
-
g
h
[M-H] =339.2411
5e+5
[M-H]-=283.2623
M=284.2701
30000
20000
10000
Unidentified
M=340.2489
4e+5
Stearic Acid
HMDB00827
Intensity (A.U.)
40000
Intensity (A.U.)
200
3e+5
2e+5
1e+5
224.0627
[2M-H]-=679.4900
679.4982
329.3116
0
200
250
300
350
200
400
m/z
300
400
500
m/z
- 15 -
600
700
Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of
1st isomer listed is included as inset in spectra for those features that were tentatively matched to compounds).
1e+5
12000
[M-H]-=353.1598
i
10000
Intensity (A.U.)
M=354.1676
Intensity (A.U.)
j
GlnHisAla
MID 23091
8e+4
6e+4
4e+4
177.1051
[M+HCOO]-=403.2851
M = 358.2869
5b-Chol-9(11)-en-24-oic Acid
MID 42731
8000
6000
4000
-
[2M-H] =707.3274
707.3251
2e+4
2000
761.4792
113.0287
0
0
100
200
300
400
500
600
300
700
400
500
700
800
m/z
m/z
1e+5
600
12000
k
-
[M-H] =367.1579
8e+4
DHEA Sulfate
HMDB01032
443.2964
l
10000
Intensity (A.U.)
Intensity (A.U.)
Unidentified
M=368.1652
6e+4
4e+4
[M-H]-=443.2964
M=444.3037
8000
6000
[M+HCOO]-=443.2964
M=398.2982
4000
[M+CH3COO]-=443.2964
M=384.2831
2e+4
478.3232
2000
502.3232
397.2252
562.3633
0
0
250
300
350
400
450
500
300
m/z
350
400
450
m/z
- 16 -
500
550
600
Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of
1st isomer listed is included as inset in spectra for those features that were tentatively matched to compounds).
[M-H]PC(P-16:0/0:0)
HMDB10407
10000
m
8000
70000
n
60000
[M-H]-=480.2757
PC(10:0/4:0)
LMGP 01010403
50000
Intensity (A.U.)
Intensity (A.U.)
443.2964
6000
[M-H]-=478.3232
M=479.3310
4000
478.3232
[M+HCOO]-=478.3232
M=433.3256
[M+HCOO]Lithocholic acid
glycine conjugate
HMDB00698
40000
M=481.2835
[M+CH3COO]-=540.2968
540.3030
30000
20000
2000
10000
538.3580
0
200
300
400
500
600
700
800
900
0
1000
200
400
[M-H]-=480.2969
p
M=495.3210
3.0e+5
Unidentified
Unidentified
M=481.3047
2.5e+5
Intensity (A.U.)
Intensity (A.U.)
1000
[2M+H]+=991.6498
991.6656
3.5e+5
o
40000
800
m/z
m/z
50000
600
30000
[M+CH3COO]-=540.3180
20000
540.3236
2.0e+5
1.5e+5
[M+H]+=496.3288
1.0e+5
10000
5.0e+4
1485.7690
518.3421
400
450
500
550
400
600
600
800
1000
m/z
m/z
- 17 -
1200
1400
1600
Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of
1st isomer listed is included as inset in spectra for those features that were matched to compounds).
[M-H]-=498.9535
3500
35000
q
r
30000
3000
Unidentified
Unidentified
25000
Intensity (A.U.)
2500
Intensity (A.U.)
[M-H]-=504.2864
M=499.9613
2000
1500
M=505.2842
20000
[M+CH3COO]-=564.3075
564.3161
15000
10000
1000
[M+CH3OH-H]-=530.9797
530.9672
500
601.6671
5000
0
0
350
400
450
500
550
600
0
650
200
400
600
800
8000
20000
[M-H]-=564.3441
M=563.3363
Unidentified
1400
1600
1800
-
[M-H] =506.3053
t
M=507.3131
[M+HCOO]-=564.3441
M=519.3459
15000
6000
-
Unidentified
591.3513
[M+CH3COO] =564.3441
M=505.3308
Intensity (A.U.)
Intensity (A.U.)
1200
m/z
m/z
s
1000
283.2793
4000
478.2629
10000
[M+CH3COO]-=566.3264
566.3264
564.3441
2000
5000
111.1315
0
0
0
100
200
300
400
500
600
700
460
m/z
480
500
520
m/z
- 18 -
540
560
580
600
Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of
1st isomer listed is included as inset in spectra for those features that were matched to compounds).
[M+H]+=520.3408
PE(9:0/10:0)
MID 40490
50000
520.3408
LysoPC(18:2(9Z,12Z))
LMGP 01050036
1.4e+5
[M-H]-=508.3078
40000
Intensity (A.U.)
v
1.2e+5
30000
Intensity (A.U.)
u
1.6e+5
M=509.3156
568.3318
-
[M+CH3COO] =568.3289
20000
M=519.3330
1.0e+5
8.0e+4
6.0e+4
4.0e+4
10000
2.0e+4
502.2978
[M+Na]+=542.3227
542.3250
0
0
200
400
600
800
1000
1200
1400
1600
1800
400
450
500
m/z
[M+H]+=758.5756
758.5756
w 3.0e+4
M=757.5678
50000
x
PE-NMe(18:1(9E)/18:1(9E))
LMGP 02010331
40000
Intensity (A.U.)
Intensity (A.U.)
2.5e+4
550
600
650
700
m/z
2.0e+4
1.5e+4
[M+H]+=782.5673
M=781.5595
[M+H]+
PC(14:0/22:4(7Z,10Z,13Z,16Z))
782.5673
HMDB07879
[M+Na]+=782.5673
M=759.5775
[M+Na]+
PC(14:0/20:1(11Z)
HMDB07889
30000
758.5726
20000
1.0e+4
[M+Na]+=780.5575
780.5453
804.5320
10000
5.0e+3
0
700
720
740
760
780
800
820
840
760
780
800
m/z
m/z
- 19 -
820
840
Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of
1st isomer listed is included as inset in spectra for those features that were matched to compounds).
PC(14:0/22:1(13Z))
HMDB07887
20000
y
758.5833
Intensity (A.U.)
15000
M=787.6000
[M+Na]+=810.5898
780.5714
10000
832.6013
5000
760
780
800
820
840
m/z
- 20 -
Download