1 Metabolomics metadata 2 3 4 5 6 1. Plant context metadata 1.1. Plant materials 1.1.1. BioSource Species Soybean Glycine max (9 varieties) 7 8 9 10 1.1.2. Genotypes/Varieties Williams, A3127, A3469, A3555, A3733/CX329 (CX375), AG3701, AG3803, CX366, and AG3705 11 12 13 1.1.3. Organ specification Mature seeds 14 15 16 17 18 19 20 21 1.1.4. Growth conditions Nine soybean varieties representing a genetic lineage from Williams (1972) to A3555 (2008) were grown at two sites in Illinois (Jerseyville [ILJA] and Jacksonville [ILJA]) during the 2011 season. Varieties included six conventional and three glyphosate-tolerant lines. Starting seeds were planted in a randomized complete block design with six replicates. Soybean plants were treated with maintenance pesticides as necessary throughout the growing season at both sites. The three Roundup Ready varieties were not treated with glyphosate. 22 23 24 25 26 27 28 29 1.1.5. Experimental conditions Same as the growth conditions. Soybean seeds of 5-6 biological replications were harvested at maturity on 2011. Seeds for each replicate was homogenized by grinding with dry ice to a fine powder, lyophilized and stored frozen at approximately -20°C prior to analysis. We weighed 70 mg dry weight (DW) for CE-TOF-MS analysis, 5 mg DW for GC-TOF-MS analysis, 50 mg DW for LC-q-TOF-MS analysis to detect polar metabolites, and 15 mg DW for lipid profiling. 30 31 32 33 34 35 36 37 38 2. Chemical analysis metadata Chemicals All the chemicals and reagents that were used for this study were of spectrometric grade. Chemicals excluding isotope reference compounds and reagents for silylation were purchased from Sigma Aldrich (Tokyo, Japan), Nacalai Tesque (Kyoto, Japan), or Wako Pure Chemical Industries (Osaka, Japan). The 6 stable isotope compounds ([13C5]-proline, [2H4]-succinic acid, [2H6]-2-hydroxybenzoic acid, [13C3]-myristic acid, [13C12]-sucrose, and [2H7]cholesterol) were purchased from Cambridge Isotope Laboratories (Andover, MA, USA); 1 39 [13C5,15N]-glutamic acid and [13C6]-glucose from Spectra Stable Isotopes (Columbia, 40 Maryland, USA), [2H4]-1,4-diaminobutane was from C/D/N ISOTOPES (Pointe-Claire, Quebec, Canada), and [13C4]-hexadecanoic acid from Icon (Mt. Marion, NY, USA). The reagent for trimethylsilylation, N-methyl-N-trimethylsilyltrifluoroacetamide (MSTFA) was purchased from Tokyo Chemical Industry (Tokyo, Japan). 41 42 43 44 45 46 47 48 2.1. BioSource amount We weighed 70 mg dry weight (DW) of the lyophilized samples for CE-TOF-MS analysis, 5 mg DW for GC-TOF-MS analysis, 50 mg DW for LC-q-TOF-MS analysis to detect polar metabolites, and 15 mg DW for lipid profiling. 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 2.2. Sample processing and extraction 2.2.1. Extraction for CE-TOF-MS Seventy mg DW of each sample was extracted in 20 volumes of methanol containing 8 μM of two reference compounds (methionine sulfone for cation and camphor 10-sulfonic acid for anion analyses) using a Retsch mixer mill MM310 at a frequency of 27 Hz for 1 min. The extracts were then centrifuged at 15,000 × g for 3 min at 4 °C. Five hundred-μl aliquot of the supernatant was transferred into a tube. Five hundred μl of chloroform and 200 μl of water was added into the tube to perform liquid-liquid distribution. The upper layer was evaporated for 30 min at 45°C by a centrifugal concentrator to obtain two layers. For removing highmolecular-weight compounds such as oligo-sugars, the upper layer was centrifugally filtered through a Millipore 5-kDa cutoff filter at 9,100 g for 120 min at 4°C. The filtrate was dried for 120 min by a centrifugal concentrator. The residue (ca. 25 mg of each sample) was dissolved into 20 μl of water containing 200 μM of internal standards (3-aminopyrrolidine for cation and trimesic acid for anion analyses) that were used for compensation of migration time in the peak annotation step. 65 66 67 68 69 70 71 72 73 74 75 76 2.2.2. Extraction and derivatization for GC-TOF-MS Each sample with a 5-mm zirconia bead was extracted with a concentration of 100 mg DW of powder per ml extraction medium (methanol/chloroform/water [3:1:1 v/v/v]) containing 10 stable isotope reference compounds at 4°C in a mixer mill (MM301; Retsch, Haan, Germany) at a frequency of 15 Hz. Each isotope compound was adjusted to a final concentration of 15 ng per 1-μl injection volume. After 5-min centrifugation at 15,100 × g, a 200-μl aliquot of the supernatant was transferred to a glass insert vial. The extracts were evaporated to dryness in an SPD2010 SpeedVac® concentrator (Thermo Fisher, Scientific, Waltham, MA, USA). We used extracts from 1-mg DW samples for derivatization, i.e., methoxymation and silylation. For methoxymation, 30 μl of methoxyamine hydrochloride (20 mg/ml in pyridine) were added to the sample. After 17 h of derivatization at room temperature the sample was 2 77 trimethylsilylated for 1 h using 30 µl of MSTFA at 37°C with shaking. All derivatization 78 steps were performed in a vacuum glove box VSC-100 (Sanplatec, Osaka, Japan) filled with 99.9995% (G3 grade) dry nitrogen. 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 2.2.3. Extraction for LC-q-TOF-MS to detect polar metabolites Fifty-mg DW of each sample was extracted in 50 volumes of extraction medium (methanol/water [2:5 v/v]) containing two reference compounds (0.5 mg/l flavonol-2’sulfonic acid and1.0 mg/l ampicilin) using a mixer mill MM301 (Retsch) at a frequency of 20 Hz for 5 min at 4°C. After centrifugation for 10 min at 15,000 × g, the supernatant was transferred into a 2 ml tube. Thirty volumes of methanol were added to the tube and then extracted again using the mixer mill at a frequency of 20 Hz for 5 min at 4°C. After centrifugation for 10 min at 15,000 × g, the resulting supernatant was transferred into the tube. One hundred twenty-μl aliquot of the extracts was filtered using an Oasis® HLB μelusion plate (30 μm, Waters Co., Massachusetts, US). The extracts (100 μl) were transferred into a 2 ml tube and were evaporated to dryness in an SPD2010 SpeedVac® concentrator from ThermoSavant (Thermo Fisher Scientific). The extracts were dissolved by 100 μl of 20% aqueous methanol containing 0.5 mg l−1 lidocaine and 10-camphorsulfonic acid. 95 96 97 98 99 100 101 102 103 104 105 106 2.2.4. Extraction for LC-q-TOF-MS to detect lipids Each sample (15 mg DW) was extracted with 80 volume of methyl tert-butyl ether /methanol (3:1, v/v) containing 20 μM of 1,2-dioctanoyl-sn-glycero-3-phosphocholine (SIGMA. After adding the extraction solvent, samples were vigorously mixed using a vortex mixture. To each sample, 25 volume of water was added, and then vigorously mixed for 5 min at room temperature. After standing for 15 min on ice, the samples were centrifuged at 1,000 × g at 5°C for 5 min. The supernatant (50μl) was transferred to a 2 ml tube. Each extract was evaporated to dryness by SPD2010 SpeedVac® concentrator (Thermo Fisher Scientific). The residue was dissolved in 1,250 μl of ethanol, and centrifuged at 10,000 × g at 45°C for 15 min. Two hundred microliter of the supernatant was transferred to a glass tube for lipid analysis. 107 108 109 110 111 112 113 114 2.3. Analytical conditions 2.3.1. CE-TOF-MS conditions All CE-TOFMS experiments were performed using an Agilent G7100A CE Instrument (Agilent Technologies, Sacramento, CA), an Agilent G6224A TOF LC/MS system, an Agilent 1200 Infinity series G1311C Quad Pump VL, and the G1603A Agilent CE-MS adapter and G1607A Agilent CE-ESI-MS sprayer kit. The G1601BA 3D-CE ChemStation software for CE and G3335-64002 MH Workstation were used. 3 115 Separation column and electrolytes: 116 Separations were carried out using a fused silica capillary (50 μm i.d. × 100 cm total length) filled with 1 M formic acid for cation analyses or with 20 mM ammonium formate (pH 10.0) for anion analyses as the electrolyte. The capillary temperature was maintained at 20°C. 117 118 119 120 121 122 Sample injection: The sample solutions (11.25 μl of extracts, ca. 0.6 μg of each sample) were injected at 50 mbar for 15 sec (15 nl). The sample tray was cooled below 4 °C. 123 124 125 126 127 Separation parameters: Prior to each run the capillary was flushed with electrolyte for 5 min. The applied voltage for separation was set at 30 kV. Fifty percent (v/v) methanol/water containing 0.5 μM reserpine was delivered as the sheath liquid at 10 μl/min. 128 129 130 131 Ionization: ESI-TOFMS was conducted in the positive ion mode for cation analyses or in the negative ion mode for anion analyses, and the capillary voltage was set at 4 kV. 132 133 134 Dry gas condition: A flow rate of heated dry nitrogen gas (heater temperature 300 °C) was maintained at 10 psig. 135 136 137 138 Voltage settings in TOF-MS: The fragmentor, skimmer, and Oct RFV voltage were set at 110V, 50V, and 160V for cation analyses or at 120V, 60V, and 220V for anion analyses, respectively. 139 144 Mass calibration: Automatic recalibration of each acquired spectrum was performed using reference masses of reference standards. The methanol dimer ion ([2M+H]+, m/z 65.0597) and reserpine ([M+H] + , m/z 609.2806) for cation analyses or the formic acid dimer ion ([2M-H]-, m/z 91.0037) and reserpine ([M-H] -, m/z 607.2661) for anion analyses provided the lock mass for exact mass 145 measurements. 140 141 142 143 146 147 148 Mass data acquirement: Exact mass data were acquired at a rate of 1.5 cycles/sec over a 50-1000 m/z range. 149 150 151 152 Quality control: In an every single sequence analysis (maximum 36 samples) on our CE-TOF-MS system, we analyzed the standard compound mixture at the first and the end of sample analyses. The 4 153 detected peak area of standard compound mixture was checked in point of reproducible 154 sensitivity. Standard compound mixture composed of major detectable metabolites including amino acids and organic acids, and this mixture was newly prepared at least once a half year. In all analyses in this study, there were no differences in the sensitivity of standard compounds mixture. 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 2.3.2. GC-TOF-MS conditions Using the splitless mode of a CTC CombiPAL autosampler (CTC Analytics, Zwingen, Switzerland), 1 l of each sample (equivalent to 1.4 µg DW) was injected into an Agilent 6890N gas chromatograph (Agilent Technologies, Wilmingston, DE, USA) featuring a 30 m × 0.25 mm inner diameter fused-silica capillary column and a chemically bound 0.25-μl film Rxi-5 Sil MS stationary phase (RESTEK, Bellefonte, PA, USA) with a tandem connection to a fused silica tube (1 m, 0.15 mm). An MS column change interface (ms NoVent-J; SGE, Yokohama, Japan) was used to prevent air and water from entering the MS during column change-over. Helium was the carrier gas at a constant flow rate of 1 ml min-1. The temperature program for GC-MS analysis started with a 2-min isothermal step at 80°C followed by 30°C temperature-ramping to a final temperature of 320°C that was maintained for 3.5 min. The transfer line and the ion source temperatures were 250 and 200°C, respectively. Ions were generated by a 70-eV electron beam at an ionization current of 2.0 mA. The acceleration voltage was turned on after a solvent delay of 222 sec. Data acquisition was on a Pegasus IV TOF mass spectrometer (LECO, St. Joseph, MI, USA); the acquisition rate was 30 spectra s-1 in the mass range of a mass-to-charge ratio of m/z = 60–800. Alkane standard mixtures (C8 - C20 and C21 - C40) purchased from Sigma-Aldrich (Tokyo, Japan) were used for calculating the retention index (RI) (Schauer N, et al. (2005) GC-MS libraries for the rapid identification of metabolites in complex biological samples. FEBS lett 579(6):1332-1337). For quality control we injected methylstearate into every 6th sample. The sample run order was randomized in single-sequence analyses. We analyzed the standard compound mixtures using the same sequence analysis procedures. 181 182 183 184 185 186 187 188 189 190 2.3.3. LC-q-TOF-MS conditions to detect polar metabolites After preparation of the extracts, the sample extracts (1 μl) were analyzed using an LC-MS system equipped with an electrospray ionization (ESI) interface (LC, Waters Acquity UPLC system; MS, Waters Xevo G2 Q-Tof). The analytical conditions were as follows. LC: column, Acquity bridged ethyl hybrid (BEH) C18 (pore size 1.7 μm, length 2.1 × 100 mm, Waters); solvent system, solvent A (water containing 0.1% formic acid) and solvent B (acetonitrile with 0.1% formic acid); gradient program, 0.5% of solvent B at 0 min, 0.5% of solvent B at 0.1 min, 99.5% of solvent B at 12.0 min, 99.5% of solvent A and 0.5%B at 12.0 min, 0.5% of solvent B at 12.1 min, and 0.5% of solvent B at 15.0 min; flow rate, 0.3 ml/min; 5 191 temperature, 40°C; MS detection: capillary voltage, +3.0 keV, cone voltage, 25.0 V, source 192 temperature, 120°C, desolvation temperature, 450°C, cone gas flow, 50 l per h; desolvation gas flow, 800 l per h; collision energy, 6 V; mass range, m/z 100‒1500; scan duration, 0.1 sec; interscan delay, 0.014 sec; mode, centroid; polarity, positive; Lockspray (Leucine enkephalin): scan duration, 1.0 sec; interscan delay, 0.1 sec. The data were recorded using MassLynx version 4.1 software (Waters). 193 194 195 196 197 198 199 200 201 202 203 204 2.3.4. LC-q-TOF-MS conditions to detect lipids Sample extracts (1 μl) were analyzed using an LC-MS system equipped with an electrospray ionization (ESI) interface (HPLC, Waters Acquity UPLC system; MS, Waters Xevo G2 Qtof). Two-solvent (A and B) system was used for separation of each metabolite. Compositions of these solvents were as follows: solvent A, acetonitrile: water:1 M ammonium acetate:formic acid = (158 g:800g:10 ml:1 ml); solvent B, acetonitrile:2-propanol:water:1 M ammonium acetate:formic acid = (79 g:711 g:10 ml:1 ml). The analytical conditions were as follows. 211 HPLC: column, Acquity UPLC HSS T3 (pore size 1.8 μm, 1.0 i.d × 50 mm long, Waters); gradient program, 35% B at 0 min, 70% B at 3 min, 85% B at 7 min, 90% B at 10 min, 90% B at 12 min and 35% B at 12.5 min; flow rate, 0.15 ml/min; temperature, 55°C; MS detection: capillary voltage, +3.0 kV; cone voltage, 20 V for positive mode and 40 V for negative mode; source temperature, 120°C; desolvation temperature, 450°C; cone gas flow, 50 l/h; desolvation gas flow, 450 l/h; collision energy, 6 V; detection mode, scan (m/z 100– 2000; scan time, 0. 5 sec; centroid). The scans were repeated for 15 min in a single run. The 212 data were recorded using MassLynx version 4.1 software (Waters). 205 206 207 208 209 210 213 214 215 216 217 218 219 220 221 222 223 2.4. Data processing 2.4.1. Data processing for CE-TOF-MS data An original data file (.wiff) was converted to an unique binary file (.kiff) using the in-house software (nondisclosure). Peak picking and alignment were performed using another in-house software (nondisclosure), peaks were picked and aligned among samples automatically. By contrast with the detected m/z and migration time values of standard compounds including internal standards, peaks were annotated automatically using the same software. For normalization, the individual area of the detected peaks was divided by the peak area of the internal reference standards. Based on the calibration curves for standard compounds, peak area values were converted into values corresponding to amounts. 224 225 226 227 228 2.4.2. Data processing for GC-TOF-MS data Nonprocessed MS data from GC-TOF-MS analysis were exported in NetCDF format generated by chromatography processing- and mass spectral deconvolution software (Leco 6 229 ChromaTOF version 3.22; LECO, St. Joseph, MI, USA) to MATLAB 6.5 or MATLAB2011b 230 (Mathworks, Natick, MA, USA) for the performance of all data-pretreatment procedures, e.g. smoothing, alignment, time-window setting H-MCR, and RDA (Jonsson P, et al. (2006) Predictive metabolite profiling applying hierarchical multivariate curve resolution to GC-MS data--a potential tool for multi-parametric diagnosis. J Proteome Res 5(6):1407-1414.). The resolved MS spectra were matched against reference mass spectra using the NIST mass spectral search program for the NIST/EPA/NIH mass spectral library (version 2.0) and our custom software for peak-annotation written in JAVA. Peaks were identified or annotated based on their RIs, a comparison of the reference mass-spectra with the Golm Metabolome Database (GMD) released from CSB.DB (Kopka J, et al. (2005) GMD@CSB.DB: the Golm Metabolome Database. Bioinformatics 21(8):1635-1638), and our in-house spectral library. 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 The metabolites were identified by comparison with RIs from the library databases (GMD and our own library) and the RIs of authentic standards. The metabolites were defined as annotated metabolites after comparison with the mass spectra and the RIs from these two libraries. The data matrix was normalized using the CCMN algorithm for further analysis (Redestig H, et al. (2009) Compensation for systematic cross-contribution improves normalization of mass spectrometry based metabolomics data. Anal Chem 81(19):79747980). 247 248 249 250 251 252 253 254 255 2.4.3. Data processing for LC-q-TOF-MS data to detect polar metabolites The data matrix was aligned by MassLynx version 4.1 (Waters). The profiling data files were converted to the NetCDF format using the DataBridge function of the MassLynx software. After the processes of alignment and deisotope with the set of NetCDF data files, the data matrix was obtained. For normalization, intensity values of remained peaks was divided by those of the lidocaine ([M+H]+, m/z 235.1804) and 10-camphorsulfonic acid ([M-H]-, m/z 231.0691) after cutoff of the low-intensity peaks (less than 500 counts). 256 257 258 259 260 261 262 263 2.4.4. Data processing for LC-q-TOF-MS data to detect lipids The data matrix was generated using the Makerlynx XS (Waters) using the profiling data files recorded in the MassLynx format (raw). The data matrices were processed using in-house Perl script. The original peak intensity values were divided with that of the internal standard (didecanoyl-sn-glycerophosphocholine at m/z 566.382 [M + H]+ and at m/z 610.372 [M + HCOO]– for the positive and negative ion modes, respectively) to normalize the peak intensity values among the metabolic profile data. 264 265 266 2.5. Statistical data analysis for metabolite profile data The multi-platform data was summarized by unifying metabolite identifiers to a common 7 267 referencing scheme using the MetMask tool (Redestig H, Kusano M, Fukushima A, Matsuda 268 F, Saito K, Arita M: Consolidating metabolite identifiers to enable contextual and multiplatform metabolomics data analysis. BMC bioinformatics 2010, 11:214). The four matrices were then concatenated and correlated peaks with the same annotation were replaced by their first principal component. All data was log2 or log10 transformed prior to further data analysis. Principal component analysis (PCA) was performed on unit-variance scaled metabolite matrixes (observations, 81 samples; variables, 681 or 701 peaks) with log10 transformation using the pcaMethods package (Ref: Stackles) or SIMCA-P+ 13.0 software (Umetrics AB, Umeå, Sweden). 269 270 271 272 273 274 275 276 277 8