Additional file 1: Detailed experimental protocols, supplementary analysis of results, and metabolic Identification Method S-1: Detailed Description of Sample Pretreatment and LC/TOF MS Protocols Serum samples were thawed, and proteins precipitated by addition of acetonitrile to the serum sample in a 5:1 ratio (1000 L acetonitrile + 200 L serum). The mixture was vortexed for 1 minute and incubated at room temperature for 40 minutes, then the sample was centrifuged at 13,000 g for 15 minutes and the supernatant retained. The supernatant was vacuum evaporated and the residue reconstituted in 80% acetonitrile/0.1% TFA. LC/TOF MS analyses were performed on a JEOL AccuTOF (Tokyo, Japan) mass spectrometer coupled to an Agilent 1100 Series LC system (Santa Clara, CA) via an ESI source. The TOF resolving power measured at full width half maximum was 6000 and the observed mass accuracies ranged from 5-15 ppm, depending on the signal-to-noise ratio of the particular ion investigated. The LC system was equipped with a solvent degasser, a binary pump, an autosampler, and a thermostatic column compartment (held at 25ºC). The injection volume was 15 L in all cases. Reverse phase separation of serum samples was performed using a Symmetry® C18 column (3.5 m, 2.1 x 150 mm, pore size 100Å; Waters, Milford, MA) at a flow rate of 150 L min-1. The analytical column was preceded by a Zorbax® RX-C18 guard column (5.0 m, 4.6 x 12.5 mm, pore size 2 m; Agilent). The LC solvent mixtures used were: A = 0.1% formic acid in water and B = 0.1% formic acid in acetonitrile. After a pre-run wash and equilibration with 5% B for 15 minutes, data acquisition was started and the solvent -1- composition was varied according to the solvent program described in the Table S-1. After analysis of a given serum specimen, a 0.20 mM sodium trifluoroacetate standard (NaTFA) was run for mass drift compensation purposes. For NaTFA analysis, 100% B at a flow rate of 300 L min-1 was used and data was acquired for 10 minutes. After each injection of the sample or drift correction standard, the column was washed with 100% B for 30 minutes. Table S-1: LC solvent gradient used in metabolomic experiments. Time (min) 0.0 10.0 15.0 0.0 5.0 10.0 20.0 28.0 38.0 50.0 90.0 100.0 110.0 120.0 130.0 160.0 180.0 0.0 30.0 0.0 10.0 0.0 30.0 %B (acetonitrile/0.1% formic acid) Pre-Run Column Equilibration 100 5 5 Sample Run 5 5 20 25 30 35 40 45 50 60 75 85 95 100 Post-Run Column Wash 100 100 NaTFA Standard Run 100 100 Post-Run Column Wash 100 100 -2- Flow Rate (Lmin-1) 300 150 150 150 150 150 150 150 150 150 150 150 150 150 150 150 150 300 300 300 300 300 300 Spectral data was collected in the 100-1750 m/z range with a spectral recording interval of 1.5 s and a data sampling interval of 0.5 ns for both positive and negative ion ESI modes. The settings for the TOF mass spectrometer for positive or negative ion mode were as follows: needle voltage: +/- 2000 V, ring lens: +8 V or -9 V, orifice 1: +30 V or -69 V, orifice 2: +6 V or -8 V, desolvation chamber temperature: 250ºC, orifice 1 temperature: 80ºC, nebulizing gas flow rate: 1.0 L min-1, desolvation gas flow rate 2.5 L min-1, and detector voltage +/- 2800 V. The TOF analyzer pressure was 4.8 E-6 Pa during analysis. The RF ion guide voltage amplitude was swept to ensure adequate transmission of analytes in a wide range of m/z values. The sweep parameters were as follows: initial peaks voltage: 700 V, initial time: 20%, sweep time: 50%, final peaks voltage: 2500 V. After LC/TOF MS data was collected, it was centroided, mass drift corrected using the NaTFA reference spectrum, and exported in NetCDF format for further mining. To ensure maximum reproducibility in metabolomic experiments, all serum specimens were run consecutively within a 2.5 month period. Every cancer sample was randomly paired with a normal sample and run on the same day to ensure that no temporal bias was introduced in the way samples were analyzed. Sample pairs were run in random order and in duplicate. -3- Analysis S-1: Detailed Description of the Prediction and Feature Selection Performance for the Pos-Ion-Mode and Neg-Ion-Mode Datasets The prediction performance of the pos-ion-mode and neg-ion-mode datasets evaluated without feature selection are summarized in Table S-2. As apparent in the table, the neg-ion-mode dataset had a better prediction performance than the pos-ion-mode dataset, with the highest prediction performance (81.9%) obtained using the linear SVM classifier. For the pos-ion-mode dataset, the nonlinear SVM classifier generally outperformed the linear SVM classifier, while for the negion-mode dataset, the linear SVM generally had a better performance. Table S-2: Prediction Performance (%) without Feature Selection Classifier 52-20-split Validation (50 trials) pos-ion-mode (n = 360) 70 SVM 71.8 SVM_NL neg-ion-mode (n = 232) 73.2 SVM 72.4 SVM_NL 12-fold CV LOOCV (10 trials) Accuracy Sensitivity Specificity 71.3 75.6 72.2 73.6 64.9 78.4 80 68.6 80.4 79.9 81.9 80.6 81.1 81.1 82.9 80 Table S-3 summarizes the prediction performance of the pos-ion-mode and neg-ion-mode datasets subsequent to feature selection (Figure 2a, where each feature selection method was applied to the whole dataset, then the prediction performance of the dataset containing only the selected feature subset (panel) was measured using the three evaluation processes) and gives the number of important features identified for each model. The estimated predictive performance was surprisingly high (greater than 90% for most of the methods) under LOOCV, which is perhaps the most accurate evaluation technique in this low-sample setting. For the pos-ion-mode -4- and neg-ion-mode datasets, the feature selection results of SVMRFE had the best discriminative power followed by that of SVMRFE_NL method, while SVMRW performed the worst. Table S-3: Prediction Performance (%) and the Number of Important Features for the SVM and SVM_NL Classifiers Evaluated After Feature Selection is Applied to the Whole Dataset Classifier Feature Selection pos-ion-mode (n = 360) SVM SVMRFE SVM L1SVM SVM_NL SVMRFE_NL SVM_NL SVMRW neg-ion-mode (n = 232) SVM SVMRFE SVM L1SVM SVM_NL SVMRFE_NL SVM_NL SVMRW 52-20-split Validation (50 trials) 12-fold CV (10 trials) LOOCV # Important Features 81.6 72.9 76.2 60.5 87.6 75.1 81.1 61.3 91.7 76.4 83.3 65.3 36 36 22 32 94.0 82.5 88.5 77.4 98.5 91.8 95.7 83.3 100.0 95.8 97.2 88.9 47 46 23 32 Table S-4 shows a comparison of the prediction performance of SVM in combination with feature selection methods performed under more conservative settings (Figure 2b, where at each evaluation, the feature selection method was first applied to a training dataset and then the prediction performance of the selected feature subset on the test dataset was measured) while Table S-5 shows the average number of important features identified in these models. The best prediction performance of the pos-ion-mode and neg-ion-mode datasets in this setting is 80.6%, which is comparable to the prediction performance without feature selection. The feature size is reduced from 232 to 41 (with SVMRFE on neg-ion-mode dataset using LOOCV). LOOCV evaluation leads to a higher test accuracy than the other two evaluation procedures demonstrating the effect of the training set size on the test accuracy. LOOCV evaluation results indicate that feature selection using SVMRFE achieved the best prediction performance, the L1SVM method -5- was the second best feature selection method while SVMRW was the worst. Both 52-20-split validation and 12-fold CV evaluation results indicate that i) L1SVM performed the best on the neg-ion-mode datasets, ii) SVMRFE_NL method performed the best on the pos-ion-mode dataset, and iii) SVMRW method resulted in the worst prediction accuracy. Overall, a clear winner was not easily identifiable among the tested methods. Table S-4: Prediction Performance (%) Evaluated After Feature Selection is Applied to Training Subsampling of Dataset during Each Validation Classifier Feature Selection 52-20-split Validation 12-fold CV (50 trials) (10 trials) Accuracy Sensitivity Specificity 64.0 65.5 66.5 60.2 67.5 70.6 71.4 59.7 72.2 70.8 66.7 59.7 64.9 70.3 73.0 62.2 80.0 71.4 60.0 57.1 68.4 71.5 69.1 59.6 74.7 76.2 74.3 63.6 80.6 75.0 73.6 69.4 86.5 83.8 78.4 70.3 74.3 65.7 68.6 68.6 pos-ion-mode (n = 360) SVM SVMRFE SVM L1SVM SVM_NL SVMRFE_NL SVM_NL SVMRW neg-ion-mode (n = 232) SVM SVMRFE SVM L1SVM SVM_NL SVMRFE_NL SVM_NL SVMRW LOOCV Table S-5: Statistics on the Average Number of Important Features of the Models Described in Table S-4 Classifier Feature Selection pos-ion-mode (n = 360) SVM SVMRFE SVM L1SVM SVM_NL SVMRFE_NL SVM_NL SVMRW neg-ion-mode (n = 232) SVM SVMRFE SVM L1SVM SVM_NL SVMRFE_NL SVM_NL SVMRW 52-20-split Validation 12-fold CV (50 trials) (10 trials) 25 ± 7 30 ± 2 21 ± 7 20 ± 9 31 ± 8 35 ± 2 30 ± 10 27 ± 11 35 ± 5 36 ± 1 26 ± 7 31 ± 9 27 ± 9 34 ± 2 33 ± 8 32 ± 10 33 ± 8 41 ± 2 37 ± 7 34 ± 7 41 ± 9 44 ± 2 36 ± 9 34 ± 7 -6- LOOCV Analysis S-2: Detailed Description of the Effect of Bagging on Prediction Performance The effects of the bagging strategy (bootstrap sampling was repeated 101 times, i.e. T=101) on the prediction performance for the multimode, pos-ion-mode, and neg-ion-mode datasets under LOOCV evaluation are summarized in Table S-6. The results indicate that bagging does not boost the best prediction performance (80.6%). Although it did improve the classification accuracy for the data with certain feature selection methods (highlighted in bold), it also reduced the classification accuracy for other cases (highlighted in italics). Due to these observations and its high computational cost, the bagging process was not evaluated in further tests. Table S-6: Averaged LOOCV Prediction Performance with Bagging (%): Feature Selection Methods Applied to Training Subsampling of Dataset Performance SVMRFE multimode (n = 592) 72.2 pos-ion-mode (n = 360) 70.8 neg-ion-mode (n = 232) 80.6 L1SVM SVMRFE_NL SVMRW 79.2 80.6 70.8 73.6 65.3 61.1 70.8 76.4 66.7 -7- Table S-7: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the multimode dataset.A Neutral Mass (Da) 148.0129 Species Investigated RT (min) Estimated Formulae (in order of decreasing score) Accuracy (ppm) Score (%) Tentative Identification Source Entry Spectrum Model Location (Stable at 80%) [M+H] + 116.88 C4N6O, C5H9OPS, 0.1-11.7 99.5-90.1 S-1 (a) SVMRFE_NL L1SVM 3.7-12.8 96.5-91.5 S-1 (b) SVMRFE_NL L1SVM 0.1-12.9 91.7-86.5 S-1 (c) L1SVM 0.1-12.1 94.0-89.5 S-1 (c) L1SVM S-1 (d) SVMRFE_NL L1SVM C4H8N2S2 204.0695 [M+H] + 116.87 C12H13OP, C6H12N4O2S, C14H8N2, C11H12N2S 206.1306 B [M+CH3COO] - 176.83 C6H18N6S, C13H18O2, C9H14N6, C7H19N4OP 220.1463 B [M+HCOO] - 176.83 C7H10N6S, C10H16N6, C12H19F3, C11H21O3F, C14H20O2 256.2398 [M-H] - 104.69 C16H32O2 1.7 96.3 266.1518 B [M-H] - 176.83 C11H26N2OS2, C12H26O4S, 2.7-15.0 96.9-90.7 S-1 (c) L1SVM S-1 (e) SVMRFE_NL L1SVM 16 carboxylic acid isomers (e.g. palmitic acid) see footnote (C) C8H22N6O2S, C12H29P3, C10H22N2O6 280.2460 [M-H] 282.2154 D - 98.85 C15H38P2, C15H36O2S 4.1-8.6 95.0-94.4 [M-H] - 139.70 C17H30O3 14.5 99.3 12s-hydroxy-16-heptadecynoic acid MID 35560 S-1 (f) SVMRFE_NL L1SVM 284.2701 E [M-H] - 123.87 C18H36O2 5.0 96.1 12 carboxylic acid isomers (e.g. stearic acid) see footnote (F) S-1 (g) SVMRFE_NL L1SVM 340.2489 [M-H] - 130.13 C20H38P2, C20H37O2P, 4.3-12.4 98.1-95.4 S-1 (h) SVMRFE_NL L1SVM C22H32N2O, C17H32N4O3, C16H33N6P -8- Table S-7, cont’d: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the multimode dataset.A Neutral Mass (Da) Species Investigated RT (min) Estimated Formulae (in order of decreasing score) Accuracy (ppm) Score (%) Tentative Identification Source Entry Spectrum Location Model (Stable at 80%) [M-H]- 42.40 C14H22N6O5 6.9 95.4 6 peptide isomers (e.g. GlnHisAla) see footnote (G) S-1 (i) SVMRFE_NL L1SVM 358.2869 H [M+HCOO]- 154.65 C24H38O2 0.7 93.2 11 bile acid isomers (e.g. 5b-chol-9(11)-en-24-oic Acid) see footnote (I) S-1 (j) L1SVM 368.1652 J [M-H]- 85.48 C19H28O5S 1.4 93.1 2 isomers (e.g. DHEA Sulfate) see footnote (K) S-1 (k) SVMRFE_NL L1SVM 384.2831 L [M+CH3COO]- 90.74 C26H40S, C23H44S2, C21H40N2O2S, C29H36, C18H44N2O2S2 3.4-13.9 94.0-85.9 S-1 (l) SVMRFE_NL L1SVM 398.2982 L [M+HCOO]- 90.74 C27H42S, C24H46S2, C22H42N2O2S, C25H38N2O2, C19H46N2O2S2 3.8-1.4 94.0-86.9 S-1 (l) SVMRFE_NL L1SVM 433.3256 M [M+HCOO]- 91.97 C26H43NO4 14.8 98.8 S-1 (m) SVMRFE_NL L1SVM 444.3037 L [M-H]- 90.74 C24H40N6S, C28H45PS, C28H44O2S, C25H49PS2, C25H48O2S2 S-1 (l) SVMRFE_NL L1SVM 479.3310 M [M-H]- 91.97 C24H50NO6P 13.7 96.6 8 glycerophosphocholine isomers (e.g. PC(P-16:0/0:0) see footnote (O) S-1 (m) SVMRFE_NL L1SVM 481.2835 [M-H]- 106.07 C22H44NO8P 6.3 90.4 10 glycerophosphocholine isomers (e.g. PC(10:0/4:0)) see footnote (P) S-1 (n) SVMRFE_NL L1SVM 481.3047 [M-H]- 116.28 C12H35N17O4, C12H36N17O2P, C17H50N5O4P3, C16H39N11O6, C17H51N5O2P4 2.2-14.9 95.1-93.4 S-1 (o) L1SVM 354.1676 Lithocholic acid glycine conjugate HMDB 00698 N 0.45-13.0 94.1-91.9 -9- Table S-7, cont’d: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the multimode dataset.A Neutral Mass (Da) Species Investigated RT (min) Estimated Formulae (in order of decreasing score) Accuracy (ppm) Score (%) 495.3210 [M+H]+ 109.68 C21H46N5O6P, C21H45N5O8, C18H37N15O2, C19H37N13O3, C20H47N7O3P2 1.1-13.7 499.9613 [M-H]- 166.34 C21H8O13S, C21H9O11PS, C20H10N2O8P2S, C19H22P6S2, C18H4N4O12S 505.2842 [M-H]- 100.09 [M+CH3COO]- 147.77 505.3308 Q Tentative Identification Source Entry Spectrum Location Model (Stable at 80%) 99.7-98.8 S-1 (p) SVMRFE_NL L1SVM 2.0-14.5 96.6-96.1 S-1 (q) SVMRFE_NL L1SVM C22H47N5P4, C22H46N5O2P3, C17H31N17O2, C20H36N13OP, C19H41N9O3P2 0.9-12.1 99.5-97.9 S-1 (r) SVMRFE_NL L1SVM C28H49N3OP2, C29H49NO2P2, C27H39N9O, C29H48NO4P, C25H44N7O2P 2.5-13.8 94.0-92.6 S-1 (s) SVMRFE_NL L1SVM S-1 (t) L1SVM 507.3131 [M-H]- 112.77 C28H45NO7, C28H46NO5P, C26H46N5OPS, C27H45N3O4S, C26H37N9O2 0.1-12.8 97.3-96.2 509.3156 [M-H]- 121.27 C24H48NO8P 7.6 91.8 6 glycerophospholipid isomers (e.g. PE(9:0/10:0)) see footnote (R) S-1 (u) SVMRFE_NL L1SVM 519.3330 [M+H]+ 100.17 C26H50NO7P 1.0 99.0 3 PC(18:2/0:0) isomers (e.g. LysoPC(18:2(9Z,12Z))) see footnote (S) S-1 (v) SVMRFE_NL L1SVM 519.3459 Q [M+HCOO]- 147.77 C26H46N7O2P, C27H57NP4, C29H51N3OP2, C26H45N7O4, C27H45N5O5 1.7-14.2 93.3-92.8 S-1 (s) SVMRFE_NL L1SVM 563.3363 Q [M-H]- 147.77 C26H47N9OP2, C24H37N17, C28H57NO2P4, C25H37N15O, C27H46N7O4P 2.7-10.2 94.0-93.0 S-1 (s) SVMRFE_NL L1SVM 757.5678 [M+H]+ 127.85 C42H80NO8P 7.5 83.3 S-1 (w) L1SVM - 10 - 31 glycerophospholipid isomers (e.g. PE-NMe(18:1(19E)/18:1(9E))) see footnote (T) Table S-7, cont’d: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the multimode dataset.A Neutral Mass (Da) Species Investigated RT (min) Estimated Formulae (in order of decreasing score) Accuracy (ppm) Score (%) Tentative Identification Source Entry Spectrum Location Model (Stable at 80%) 759.5775 U [M+Na]+ 138.38 C42H82NO8P 0.4 42.6 18 glycerophosphocholine isomers (e.g. PC(14:0/20:1(11Z))) see footnote (V) S-1 (x) L1SVM 781.5595 U [M+H]+ 138.38 C44H80NO8P 3.4 46.0 32 glycerophosphocholine isomers (e.g. PC(14:0/22:4(7Z,10Z,13Z,16Z))) see footnote (W) S-1 (x) L1SVM 787.6000 X [M+Na]+ 136.68 C44H86NO8P 11.6 74.6 22 glycerophosphocholine isomers (e.g. PC(14:0/22:1(13Z))) see footnote (Y) S-1 (y) SVMRFE_NL L1SVM (A) Matches to identified compounds were made using accurate mass measurements and isotope cluster matching. For species which could not be matched against metabolite databases, the top matching formulae (according to score) are listed (for features matching fewer than five formulae, all formulae are shown). For features corresponding to multiple possible isomers, the following nomenclature is given: # isomers found including name of isomer [source (cross-listed source, if any)]. (B) Adduct analysis yielded multiple possible ion species for this feature. All are listed as none could be matched against the databases. (C) 16 isomers found including palmitic acid [LMFA 01010001 (HMDB 00220)], isopalmitic acid [LMFA 01020010], 2,6-dimethyl-tetradecanoic acid [LMFA 01020038], 2,8-dimethyl-tetradecanoic acid [LMFA 01020039], 3-methyl-pentadecanoic acid [LMFA 01020164], 2-propyl-tridecanoic acid [LMFA 01020165], 2-hexyl-decanoic acid [LMFA 01020166], 3-ethyl-3-methyl-tridecanoic acid [LMFA 01020167], 2-heptyl-nonanoic acid [LMFA 01020168], 6-ethyl-tetradecanoic acid [LMFA 01020169], 2,4-dimethyl-tetradecanoic acid [LMFA 01020170], 3,5-dimethyl-tetradecanoic acid [LMFA 01020171], 4-hexyl-decanoic acid [LMFA 01020172], 2-ethyl-2-butyl-decanoic acid [LMFA 01020173], 13-methyl-pentadecanoic acid [LMFA 01020192], 4,8,12-trimethyl-tridecanoic acid [LMFA 01020249]. (D) Adduct analysis yielded multiple possible ion species for this feature. Only 1 species could be tentatively identified. (E) Adduct analysis yielded multiple possible ion species for this feature. Only 1 species could be tentatively identified - 11 - Table S-7, cont’d: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the multimode dataset.A (F) 12 isomers found including stearic acid [HMDB 00827 (LMFA 01010018, MID 189, MMCD cq_00998)], 10-methyl-heptadecanoic acid [MID 4292 (LMFA 01020013)], (+)-isostearic acid [MID 4293 (LMFA 01020014)], 2,6-dimethyl-hexadecanoic acid [MID 4324 (LMFA 01020042)], 4,8-dimethylhexadecanoic acid [MID 4325 (LMFA 01020043)], 2,14-dimethyl-hexadecanoic acid [MID 4326 (LMFA 01020044)], 4,14-dimethyl-hexadecanoic acid [MID 4327 (LMFA 01020045)], 6,14-dimethyl-hexadecanoic acid [MID 4328 (LMFA 01020046)], lambda isostearic acid [MID 4493 (LMFA 01020093)], neostearic acid [MID 4620 (LMFA 01020094)], 11,15-dimethyl-hexadecanoic acid [MID 34604 (LMFA 01020175)], 15-methylheptadecanoic acid [MID 34632 (LMFA 01020205)]. (G) 6 isomers found including Gln His Ala [MID 23091], Gln Ala His [MID 22217], Ala His Gln [MID 21229], Ala Gln His [MID 16023], His Gln Ala [MID 20595], His Ala Gln [MID 18707]. (H) Adduct analysis yielded multiple possible ion species for this feature. Only 1 species could be tentatively identified. (I) 11 isomers found including 5b-chol-9(11)-en-24-oic Acid [MID 42731 (LMST 04010142)], 5b-chol-11-en-24-oic Acid [MID 42732 (LMST 04010143)], 5bchol-14-en-24-oic Acid [MID 42733 (LMST 04010144)], 5b-chol-2-en-24-oic Acid [MID 42757 (LMST 04010170)], 5b-chol-3-en-24-oic Acid [MID 42846 (LMST 04010263)], Chol-4-en-24-oic Acid [MID 42847 (LMST 04010264)], Chol-5-en-24-oic Acid [MID 42848 (LMST 04010265)], 5b-chol-6-en24-oic Acid [MID 042849 (LMST 04010266)], 5b-chol-7-en-24-oic Acid [MID 42850 (LMST 04010267)], 5b-chol-8-en-24-oic Acid [MID 42851 (LMST 04010268)], 5bchol-8(14)-en-24-oic Acid [MID 42852 (LMST 04010269)]. (J) Adduct analysis yielded multiple possible ion species for this feature. Only 1 species could be tentatively identified. (K) 2 isomers found including DHEA sulfate [HMDB 01032 (LMST 05020010)], testosterone sulfate [HMDB 02833]. (L) Adduct analysis yielded multiple possible ion species for this feature. All are listed as none could be matched against the databases. (M) Adduct analysis yielded multiple possible ion species for this feature. Only species that could be tentatively identified are listed. (N) Cross-listed as MMCD cq-10750 and MID 5666. (O) 8 isomers found including PC(P-16:0/0:0) [HMDB 10407 (LMGP 01070006)], PC(O-16:1/0:0) [LMGP 01050100, 01050101, 01050102, 01050103, 01050104, 01070004, 01070005]. (P) 10 isomers found including PC(10:0/4:0) [LMGP 01010403], PC(12:0/2:0) [LMGP 01010443], PC(6:0/8:0) [LMGP 01011233, 01011234], PC(7:0/7:0) [LMGP 01011238, 01011239, 01011240], PC(8:0/6:0) [LMGP 01011248, 01011249], PC(9:0/5:0) [LMGP 01011269]. (Q) Adduct analysis yielded multiple possible ion species for this feature. All are listed as none could be matched against the databases. - 12 - Table S-7, cont’d: Tentative identifications for stable (80% threshold) L1SVM and SVMRFE_NL-selected features from the multimode dataset.A (R) 6 isomers found including PE(9:0/10:0)[U] [MID 40490 (LMGP 02010091)], PE(10:0/9:0)[U] [MID 40669 (LMGP 02010272)], PC(14:0/2:0) [LMGP 01010504], PC(8:0/8:0) [LMGP 01011251, 01011252, 01011253]. (S) 3 isomers found including glycoursodeoxycholic acid 3-sulfate [HMDB 02409 (MMCD cq_17361, MID 6670)], glycochendeoxycholic acid 7-sulfate [HMDB 02496 (MMCD cq_17159, MID 6692)], glycochendeoxycholate-3-sulfate [HMDB 02497 (MMCD cq_17507, MID 6702)]. (T) 31 isomers found including PE-NMe(18:1/18:1) [LMGP 02010331 (MMCD cq_17959), 02010333, 02010338, 02010350], PC(16:0/18:2) [LMGP 01010585, 01010586, 01010587, 01010588, 01010589, 01010590, 01010591, 01010592, 01010593, 01010594, 01010595, 01010596], PC(16:1/18:1) [LMGP 01010678, 01010680, 01010687, 01010688, 01010689], PC(17:1/17:1) [LMGP 01010726, 01010727, 01010728], PC(18:0/16:2(2E,4E)) [LMGP 01010745], PC(18:1/16:1) [LMGP 01010886, 01010887], PC(18:2/16:0) [LMGP 01010920, 01010926, 01010932, 01010933]. (U) Adduct analysis yielded several possible ion species for the selected feature. Only species having tentative matches are listed. (V) 18 isomers found including PC(14:0/20:1(11Z)) [HMDB 07879], PC(16:0/18:1) [LMGP 01010005, 01010575, 01010576, 01010577, 01010578, 01010579, 01010580, 01010581, 01010582, 01010583, 01010584], PC(16:1/18:0) [LMGP 01010679, 01010686], PC(18:0/16:1(9Z)) [LMGP 01010744], PC(18:1/16:0) [LMGP 01010874, 01010884, 01010885]. (W) 32 isomers found including PC(14:0/22:4(7Z,10Z,13Z,16Z)) [HMDB 07889], PC(16:0/20:4) [LMGP 01010007, 01010629, 01010630, 01010631], PC(18:0/18:4) [LMGP 01010772, 01010773, 01010774, 01010775, 01010776], PC(18:1/18:3) [LMGP 01010897, 01010898, 01010899], PC(18:2/18:2) [LMGP 01010918, 01010919, 01010921, 01010922, 01010923, 01010924, 01010925, 01010927, 01010928, 01010929, 01010930, 01010937, 01010938, 01010939], PC(18:3/18:1) [LMGP 01010949, 01010955], PC(20:4/16:0) [LMGP 01011049, 01011050, 01011056]. (X) Adduct analysis yielded several possible ion species for the selected feature. Only 1 species could be tentatively identified. (Y) 22 isomers found including PC(14:0/22:1(13Z)) [HMDB 07887], PC(16:0/20:1(11Z)) [LMGP 01010618], PC(18:0/18:1) [LMGP 01010749, 01010750, 01010751, 01010752, 01010753,01010754, 01010755, 01010756, 01010757, 01010758, 01010759, 01010760, 01010761, 01010762, 01010763], PC(18:1/18:0) [LMGP 01010840, 01010875, 01010888, 01010889], PC(20:1(11Z)/16:0)[U] [LMGP 01011037]. - 13 - Figure S-1: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of 1st isomer listed is included as inset in spectra for those features that were tentatively matched to compounds).+ Unidentified 2.5e+5 [M+H] =205.0773 205.0773 2.0e+5 a Unidentified b M=148.0129 2.0e+5 279.1492 205.0735 149.0128 Intensity (A.U.) Intensity (A.U.) 1.5e+5 1.5e+5 + [M+H] =149.0207 Col 1 vs Col 2 149.0207 1.0e+5 [M+CH3CN+H]+=190.0473 5.0e+4 M=204.0695 1.0e+5 5.0e+4 190.0425 190.0450 [M+H2O+H]+=223.0879 223.0907 0.0 100 120 140 160 180 200 220 240 100 120 140 160 180 m/z c d 265.1445 10000 8000 10000 260 280 300 Palmitic acid HMDB 00220 [M-H]-=255.2320 M=256.2398 Unidentified [M-H]-=265.1445 M = 266.1518 Intensity (A.U.) Intensity (A.U.) 12000 240 8000 16000 14000 220 m/z 20000 18000 200 [M+HCOO]-=265.1445 M = 220.1463 [M+HCOO]-=265.1445 M = 206.1306 6000 4000 6000 [M+HCOO]-=301.2375 301.2930 2000 4000 134.8709 2000 339.1862 0 0 100 200 300 50 400 m/z 100 150 200 250 m/z - 14 - 300 350 400 Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of 1st isomer listed is included as inset in spectra for those features that were tentatively matched to compounds). [M-H]-=279.2382 1.5e+5 e 12-hydroxy-8E,10E-heptadecadienoic Acid MID 35560 f 30000 Intensity (A.U.) Intensity (A.U.) [M-H]-=281.2076 Unidentified M=280.2460 20000 1.0e+5 M=282.2154 5.0e+4 10000 [M+HCOO]-=325.2437 325.2380 303.2285 200 250 300 350 327.2437 400 250 300 m/z 50000 350 400 m/z - g h [M-H] =339.2411 5e+5 [M-H]-=283.2623 M=284.2701 30000 20000 10000 Unidentified M=340.2489 4e+5 Stearic Acid HMDB00827 Intensity (A.U.) 40000 Intensity (A.U.) 200 3e+5 2e+5 1e+5 224.0627 [2M-H]-=679.4900 679.4982 329.3116 0 200 250 300 350 200 400 m/z 300 400 500 m/z - 15 - 600 700 Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of 1st isomer listed is included as inset in spectra for those features that were tentatively matched to compounds). 1e+5 12000 [M-H]-=353.1598 i 10000 Intensity (A.U.) M=354.1676 Intensity (A.U.) j GlnHisAla MID 23091 8e+4 6e+4 4e+4 177.1051 [M+HCOO]-=403.2851 M = 358.2869 5b-Chol-9(11)-en-24-oic Acid MID 42731 8000 6000 4000 - [2M-H] =707.3274 707.3251 2e+4 2000 761.4792 113.0287 0 0 100 200 300 400 500 600 300 700 400 500 700 800 m/z m/z 1e+5 600 12000 k - [M-H] =367.1579 8e+4 DHEA Sulfate HMDB01032 443.2964 l 10000 Intensity (A.U.) Intensity (A.U.) Unidentified M=368.1652 6e+4 4e+4 [M-H]-=443.2964 M=444.3037 8000 6000 [M+HCOO]-=443.2964 M=398.2982 4000 [M+CH3COO]-=443.2964 M=384.2831 2e+4 478.3232 2000 502.3232 397.2252 562.3633 0 0 250 300 350 400 450 500 300 m/z 350 400 450 m/z - 16 - 500 550 600 Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of 1st isomer listed is included as inset in spectra for those features that were tentatively matched to compounds). [M-H]PC(P-16:0/0:0) HMDB10407 10000 m 8000 70000 n 60000 [M-H]-=480.2757 PC(10:0/4:0) LMGP 01010403 50000 Intensity (A.U.) Intensity (A.U.) 443.2964 6000 [M-H]-=478.3232 M=479.3310 4000 478.3232 [M+HCOO]-=478.3232 M=433.3256 [M+HCOO]Lithocholic acid glycine conjugate HMDB00698 40000 M=481.2835 [M+CH3COO]-=540.2968 540.3030 30000 20000 2000 10000 538.3580 0 200 300 400 500 600 700 800 900 0 1000 200 400 [M-H]-=480.2969 p M=495.3210 3.0e+5 Unidentified Unidentified M=481.3047 2.5e+5 Intensity (A.U.) Intensity (A.U.) 1000 [2M+H]+=991.6498 991.6656 3.5e+5 o 40000 800 m/z m/z 50000 600 30000 [M+CH3COO]-=540.3180 20000 540.3236 2.0e+5 1.5e+5 [M+H]+=496.3288 1.0e+5 10000 5.0e+4 1485.7690 518.3421 400 450 500 550 400 600 600 800 1000 m/z m/z - 17 - 1200 1400 1600 Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of 1st isomer listed is included as inset in spectra for those features that were matched to compounds). [M-H]-=498.9535 3500 35000 q r 30000 3000 Unidentified Unidentified 25000 Intensity (A.U.) 2500 Intensity (A.U.) [M-H]-=504.2864 M=499.9613 2000 1500 M=505.2842 20000 [M+CH3COO]-=564.3075 564.3161 15000 10000 1000 [M+CH3OH-H]-=530.9797 530.9672 500 601.6671 5000 0 0 350 400 450 500 550 600 0 650 200 400 600 800 8000 20000 [M-H]-=564.3441 M=563.3363 Unidentified 1400 1600 1800 - [M-H] =506.3053 t M=507.3131 [M+HCOO]-=564.3441 M=519.3459 15000 6000 - Unidentified 591.3513 [M+CH3COO] =564.3441 M=505.3308 Intensity (A.U.) Intensity (A.U.) 1200 m/z m/z s 1000 283.2793 4000 478.2629 10000 [M+CH3COO]-=566.3264 566.3264 564.3441 2000 5000 111.1315 0 0 0 100 200 300 400 500 600 700 460 m/z 480 500 520 m/z - 18 - 540 560 580 600 Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of 1st isomer listed is included as inset in spectra for those features that were matched to compounds). [M+H]+=520.3408 PE(9:0/10:0) MID 40490 50000 520.3408 LysoPC(18:2(9Z,12Z)) LMGP 01050036 1.4e+5 [M-H]-=508.3078 40000 Intensity (A.U.) v 1.2e+5 30000 Intensity (A.U.) u 1.6e+5 M=509.3156 568.3318 - [M+CH3COO] =568.3289 20000 M=519.3330 1.0e+5 8.0e+4 6.0e+4 4.0e+4 10000 2.0e+4 502.2978 [M+Na]+=542.3227 542.3250 0 0 200 400 600 800 1000 1200 1400 1600 1800 400 450 500 m/z [M+H]+=758.5756 758.5756 w 3.0e+4 M=757.5678 50000 x PE-NMe(18:1(9E)/18:1(9E)) LMGP 02010331 40000 Intensity (A.U.) Intensity (A.U.) 2.5e+4 550 600 650 700 m/z 2.0e+4 1.5e+4 [M+H]+=782.5673 M=781.5595 [M+H]+ PC(14:0/22:4(7Z,10Z,13Z,16Z)) 782.5673 HMDB07879 [M+Na]+=782.5673 M=759.5775 [M+Na]+ PC(14:0/20:1(11Z) HMDB07889 30000 758.5726 20000 1.0e+4 [M+Na]+=780.5575 780.5453 804.5320 10000 5.0e+3 0 700 720 740 760 780 800 820 840 760 780 800 m/z m/z - 19 - 820 840 Figure S-1, cont’d: Mass spectra for L1SVM and SVMRFE_NL-selected features from the multimode dataset (structure and name of 1st isomer listed is included as inset in spectra for those features that were matched to compounds). PC(14:0/22:1(13Z)) HMDB07887 20000 y 758.5833 Intensity (A.U.) 15000 M=787.6000 [M+Na]+=810.5898 780.5714 10000 832.6013 5000 760 780 800 820 840 m/z - 20 -