Characterisation and quantitation expression analysis of recombinant proteins in plant complex mixtures using nanoUPLC mass spectrometry André M. Murad1, Gustavo H. M. F. Souza2, Jerusa S. Garcia3, Elíbio L. Rech1* 1 Embrapa Genetic Resources and Biotechnology, Laboratory of Gene Transfer, Parque Estação Biológica, PqEB, Av. W5 Norte, Brasília, DF, 70770-917, Brazil 2 Waters Corporation, MS Applications Research and Development Laboratory, Alameda Tocantins, 125, 27th floor, West Side, Alphaville, São Paulo, SP, 06455-020, Brazil. 3 Alfenas Federal University, Institute of Exact Sciences, Alfenas, MG, 37170-000, Brazil *corresponding author Keywords: Soybean, IdentityE, ExpressionE, MSE, ProteinLynx Global SERVER, ABSTRACT Identification of recombinant protein expressed in a total soluble protein (TSP) plant extract by mass spectrometry is desirable and necessary to accelerate further processing steps. Basically, the protocol consists of an initial TSP sample preparation and trypsin digestion prior preliminary characterization of recombinant proteins expressed in TSP samples of transgenic soybean seeds utilizing a nanoUPLC-MSe. As low as a 50 ug TSP sample can be effectively analyzed. Experimental data for the TSP extraction and sample preparation are discussed. The development of the process takes up to 3 days. INTRODUCTION The production of recombinant protein is an important step in several academic, industrial and pharmaceutical processes. Several heterologous protein expression systems are available, including bacterial1, mammalian cell-culture2 and plant3, 4 systems. Although these comprise the main production systems, the search for novel methods to increase protein yield, facilitate manipulation and reduce cost continues. Seeds are a vital alternative for recombinant protein production for several reasons: they can undergo long-term storage at ambient temperatures5, 6, they can provide an appropriate biochemical environment for protein stability through the creation of specialised storage compartments6, 7, they are not contaminated by human or animal pathogens8, they do not undergo non-enzymatic hydrolysis or protease degradation owing to their desiccation characteristics5, 8 and they do not carry the phenolic substances that are present in tobacco leaves, which is important for downstream processing3, 8. We recently produced several soybean transgenic plants expressing important pharmaceutical molecules, such as proinsulin6, human growth hormone (hGH)9 and human coagulation factor IX (hFIX)10, showing the viability of this system. On other hand, producing these transgenic lines is extremely time consuming11 and requires at least 150 days to obtain the first seeds and another 3 years for a homozygote line. At the early stage, we have little material for recombinant protein purification; as a result, the detection, quantification and characterisation of recombinant molecules rely mainly on the manipulation of total soluble protein (TSP), which contains a complex mixture with a low abundance of the protein of interest. Thus, we need a method that detects, qualifies and quantifies recombinant proteins in TSP using less than ¼ of a single seed mass (50 mg). Typically, the identification of a recombinant protein is performed using western blot analysis12 and is quantified by enzyme-linked immunosorbent assays (ELISAs)13. These methods are widely used because they are simple and relatively fast for identification and quantification, but they lack sensitive detection ability when small amounts of antigens are used, or no antibody is available, or a false positive is found and there is no way to verify the quality, amino acid sequence or post-translational modification of the recombinant protein. Two dimensional electrophoresis (2-DE) has been developed for proteomics14, 15, and because of its association with mass spectrometry, it has become a primary tool for the identification and characterisation of plant complex mixtures15, 16. 2-DE can also be used for quantification and protein mapping of tissues17, comparative proteomics18, 19 and post-translational identification20, but it requires a minimum sample amount, cannot detect molecules in low abundance, needs spot manipulations for good identification15, is mainly performed by peptide mass fingerprinting (PMF)21, 22, and has difficulty in analysing proteins with similar mass and pI because they appear as a single spot. The combination of gel and liquid chromatography mass spectrometry (LC-MS) methods may result in better identification of proteins in complex samples23, 24 , overcoming the problems of 2-DE. Liquid chromatography (LC) increases the low detection/resolution of complex mixtures on mass spectrometers (MS)25. Furthermore, the analysis of peptides or complex samples commonly known as “system samples” that are digested by trypsin is key in the detection of low abundance proteins, but this technique has limitations in terms of the analyte dilution and the minimum amounts of complex protein mixtures needed to guarantee a good dynamic range and detection of low abundance proteins15, 25-28. Nano-scale liquid chromatography with 2D separations as a strong cation exchange (SCX) followed by reverse-phase (RP) chromatography or 2D RPxRP using two pH and acetonitrile pulses combined with mass spectrometry with data independent acquisitions (nanoLC-MSE) has several benefits for proteome analysis. Among these benefits are detection and linear sequence structural information at the femtomole level29, 30 , small surface areas and minimal dead-volumes, which minimises analyte losses due to surface adsorption, and low flow rates that reduce analyte dilution. Thereby, analytes of low abundance can be separated with a high recovery rate when associated with a high dynamic range and a prevailing MS detection system 31. Recently, the nanoLC-MS method was used for the detection of differences in expression of soybean plasma membrane proteins under osmotic stress32, the regulation of stress identification on tomatoes induced by iron deficiency33 and the detection of neuropeptides secreted in Cancer borealis34, demonstrating the capability and potential of this method. Moreover, nanoLC-MSE is an important tool in post-translational characterisation of proteins, such as the identification of N-terminal peptide modifications in the chloroplast proteome35, the analysis of human protein oxidations leading to functional reduction/annulation36, and the characterisation of the phosphorylation pattern of several phosphatase splice variants expressed in a human cell line37, 38. Finally, quantification is also possible with the nanoLC-MS technique using labelling methods such as (18O) labelling peptides39 and the iTRAQ™ method40, based on relative quantification methods, such as the use of stochastic measurements between mass and intensity deviations for each ion detected41 or the absolute quantification based on a constant ion current acquired with low (MS) and high energies (MS/MS) into the mass spectrometer, called MSE 42-45. We describe herein (Fig. 1) an easy-to-handle, label-free nanoUPLC-MSE method with absolute quantification and small sample usage for the detection, quantification and characterisation of low abundance recombinant proteins expressed in soybean seeds, specifically the immunogenic tumour NY-ESO-1 antigen (cancer testis antigen 1, CTAG)46. CTAG is a protein product of the human X chromosome with 180 amino acid residues (Fig. 2), mass 18 kDa, a glycine-rich N-terminal region and an extremely hydrophobic C-terminal region that is so insoluble it can be confused with a transmembrane domain46, 47 and is therefore a challenge in the identification and characterisation of TSP extracts, as in our case. The expression pattern analysis by RTPCR for CTAG has confirmed that expression is restricted to testis and is not present in other normal tissue, but is found in several types of cancer, including bladder, breast and lung cancer48. The recombinant CTAG produced in Escherichia coli (E. coli) was the first to be evaluated in the clinical setting and ranks among the most promising trials published so far with CTAG because of the broad immunological and favourable clinical results46, 49; thus, the use of CTAG as a vaccine is viable only if coupled with a low cost, scalable recombinant protein production system. Additionally, the nanoUPLCMSE used in this procedure has particularities that enhance recombinant protein characterisation with high selectivity and specificity. The nanoUPLC-MSE is composed of a non-split, direct pump infusion, nanoscale liquid chromatography system (nanoACQUITY® UPLC, Waters, Milford, MA) and related columns and accessories. These include the use of columns packed with smaller particle sizes (<2 μm)50 and the use of columns with a smaller internal diameter (I.D. <100 μm)51. Another development to couple RP with a different separation mechanism is the method of 2D chromatography. This method can be accomplished using the ion exchange properties between the peptides or proteins with the stationary phase and mobile phase, e.g., an increase or decrease of chaotropic “salting plugs” or pH. For the last 10 years, this technique has been used as a cation exchange column (SCX) and “salting pulses” with ammonium formate, e.g., at different concentrations. Advances in this technology may allow the exploration of new frontiers in separation science to avoid ion suppression from orthogonal separation and to increase peak capacity52. These chromatography systems coupled with a high-end mass spectrometry instrument allow minimal amounts of system samples to be injected and detected with high selectivity and specificity. To achieve such high standards in this particular experiment workflow, from sample preparation to acquisition and processing, these standards must be controlled to avoid contamination and other characteristics, as described in detail in this protocol. MATERIALS Reagents Chemicals and solvents Sterile deionised water with a conductivity of less than 1.3 µS/cm, total organic carbon (TOC) less than 2 ppb, and a semiconductor equivalent specification of 0.055 µS/cm (18.2 mΩ.cm) at point-of-use at 25 °C Petroleum Ether, 30-75 °C, BAKER ANALYZED Reagent (J.T. Baker, cat. no. 9274-03) Tris base (2-Amino-2-(hydroxymethyl)-1,3-propanediol) - (Fisher Scientific Ltd, cat. no. BP152-5) KCl (Aldrich-Sigma Chemical Co. Ltd, cat. no. P9541) DL-Dithiothreitol (threo-1,4-dimercapto-2,3-butanediol) for molecular biology, ≥98% (DTT, Sigma-Aldrich, cat. no. D9779) Phenylmethanesulfonyl fluoride ≥98.5% (PMSF, Sigma-Aldrich, cat. no. P7626) Sodium dodecyl sulphate for molecular biology, ≥98.5% (SDS, Sigma-Aldrich, cat. no. L4390) Acetone CHROMASOLV® Plus, for HPLC, ≥99.9% (Sigma-Aldrich, cat. no. 650501) NH4HCO3 ReagentPlus®, ≥99.0% (Sigma-Aldrich, cat. no. A6141) RapiGEST™ SF (Waters, cat. no. 186001861) 53 Iodoacetamide BioUltra (Sigma-Aldrich, cat. no. I1149) Trifluoroacetic acid spectrophotometric grade, ≥99% (TFA, Sigma-Aldrich, cat. no. 302031) Acetonitrile LC-MS CHROMASOLV®, ≥99.9% (Fluka, cat. no. 34967) Formic puriss. p.a., for mass spectroscopy, ~98% (T) (FA, Fluka, cat. no. 94318) nanoACQUITY™ UPLC™ trap column Symmetry C18 5 μm, 180 µm x 20 mm trap column (Waters, cat no. 186003514) nanoACQUITY™ UPLC™ analytical column of 100 μm x 100 mm, 1.7 μm BEH130 C18 (Waters, cat. no. 186003546). Enzyme and standards Trypsin (Promega, cat. no. V511A) MassPREP Protein Digestion Standard Alcohol Dehydrogenase (MPDS ADH Waters, cat. no. 186002328) [Glu1]-Fibrinopeptide B human (GFP - Sigma-Aldrich, cat. no. F3261) Kits Quant-iT™ Protein Assay Kit, 500 Assays, 0.25-5 µg for use with the Qubit™ fluorometer (Invitrogen, cat. no. Q33212) Buffers and Solutions Extraction buffer (see REAGENT SETUP) 50 mM NH4HCO3 (see REAGENT SETUP) Digestion solution (see REAGENT SETUP) Alkylation solution (see REAGENT SETUP) Reduction solution (see REAGENT SETUP) Hydrolysis solution (see REAGENT SETUP) Sample solution for nanoUPLC-MSE analysis (see REAGENT SETUP) MPDS ADH solution (see REAGENT SETUP) Surfactant solution (see REAGENT SETUP) Mobile phase A (see REAGENT SETUP) Mobile phase B (see REAGENT SETUP) GFP solution (see REAGENT SETUP) Cold Acetone (Store acetone at -20 °C) EQUIPMENT Coffee grinder (Krups, model n. F203) Refrigerated centrifuge (Eppendorf, model 5810R) Analytical balance (Metter Toledo, cat. no. XP105D) 2 mL microtubes (Axygen, cat. no. MCT-200-C) 1.5 mL microtube (Axygen, cat. no. MCT-150-C) Vortex (Scientific industries, model G560E) Dry bath (Fisher Scientific, cat. no. 11-718-2) Waters Total Recovery vial (Waters, cat. no. 186000384c) nanoACQUITY™ UPLC™ system (Waters, Milford, MA, USA) NanoLockSpray™ - nanoESI source (Waters, Manchester, UK) Synapt HDMS™ mass spectrometer (Waters, Manchester, UK) REAGENT SETUP Extraction buffer (20 mM Tris-HCl, pH 8.3, 1.5 mM KCl, 10 mM DTT, 1 mM PMFS, 0.1 % V/V SDS) For 1 litre, dissolve 2.42 g of Tris base, 0.1 g of KCl, 1.54 g of DTT, 0.174 g of PMSF and 1 g of SDS in 800 mL of deionised water. Adjust the pH to 8.3 with HCl and add water to make up a final volume of 1 litre. Store at -20 °C for up to 6 months. 50 mM NH4HCO3 For 1 litre, dissolve 3.95 g of NH4HCO3 in 800 mL of deionised water. Filter through a 0.22 µm filter and store at room temperature (20–24 °C) for up to 6 months. Digestion solution Add 400 μL of 50 mM NH4HCO3 to one 20 μg vial of Promega Trypsin. Make aliquots of 10 µL and store at -80 °C for up to 6 months. Alkylation solution (300 mM Iodoacetamide) For 1 mL, dissolve 55 mg in 500 µL of deionised water. Add water to 1 mL. Store at -80 °C for up to 6 months. Reduction solution (100 mM DTT) For 1 mL, dissolve 15 mg in 500 µL of deionised water. Add water to 1 mL. Store at -80 °C for up to 6 months. Hydrolysation solution (5 % V/V TFA) For 10 mL, add 0.5 mL of TFA in 9.5 mL of deionised water. Store at room temperature (20–24 °C) for up to 6 months. Sample solution for nanoUPLC-MSE analysis (3 % V/V acetonitrile, 0.1% V/V FA) For 10 mL, add 0.3 mL of acetonitrile and 0.01 mL of FA to 9.5 mL of deionised water. Store at room temperature (20–24 °C) for up to 6 months. ADH solution Add 1 mL of the nanoUPLC-MSE solution to one vial of MPDS ADH. Make aliquots of 10 µl and store at -80 °C for up to 6 months. Surfactant solution (0.2 % V/V) Add 0.5 mL of water to one vial of 1 mg of RapiGest™ SF. Store at 4 °C for up to 3 months. Mobile phase A (0.1% V/V FA) For 1 litre, add 1 mL of FA to 999 mL of deionised water. Store at room temperature (20–24 °C) for up to 3 months. Mobile phase B (0.1% V/V FA in acetonitrile) For 1 litre, add 1 mL of FA to 999 mL of acetonitrile. Store at room temperature (20–24 °C) for up to 1 year. GFP solution (200 fmol.µL-1) Stock Solution: Add 2000 µL of acetonitrile/water 2.5/7.5 to 0.1% FA to give a solution of 32 pmol.l-1. Store in the freezer. Take 625 µL of the stock solution and fill to 100 mL with acetonitrile/water 2.5/7.5 with 0.1% of FA, giving a solution of 320 fmol.l-1. Use within 3 months. PROCEDURE Total soluble protein extraction from recombinant CTAG soybean seeds. TIMING 1-2 h for one sample 1| Using a coffee grinder, grind the soybean seeds into a fine powder. Using an analytical balance, weigh out 100 mg of powder and store the remaining powder in a vacuum bag at -80 °C for up to 1 year. 2| Place the weighed sample into a 2 mL capped centrifuge tube. Add 1 mL of petroleum ether and slowly vortex the sample for 15 min. Discard the supernatant and repeat the step twice (2X). Troubleshooting: Gently drop the solution out to avoid powder losses. 3| Allow the petroleum ether to evaporate for 10 min. Add 1 mL of the extraction buffer and slowly vortex the sample at room temperature for 10 min. 4| Leave the sample on the centrifuge for 5 min at 5000 r.min-1 at 4 °C. Transfer the supernatant to a new centrifuge tube. At this step, it can be stored at -20°C for one week. Pause point Protein concentration TIMING 1-2 h 5| For each 200 L of sample, add 800 L of cold acetone to the centrifuge tube. Vortex thoroughly and keep at -20 °C for 1 h, vortexing every 15 min. 6| Centrifuge the sample for 10 min at 13000 rpm. Discard the supernatant and allow the pellet to dry at room temperature for 30 min. Critical Step Do not overdry the pellet or it may become instable and partially insoluble. 7| Carefully dissolve the pellet with 500 μL of 50 mM NH4HCO3. Quantify it using the Quant-iT™ Protein Assay Kit (Invitrogen) and dilute it with 50 mM NH4HCO3 to a 1 g.l-1 concentration. At this point, the sample can be stored at -20 °C for one week. Critical Step For quantification purposes, the fluorometer must be calibrated for the correct protein dosage. Sample preparation for nanoUPLC-MSE acquisition TIMING 2 d 8| Place 50 μL of the 1 g.l-1 sample in a capped microcentrifuge tube. 9| Add 10 μL of 50 mM NH4HCO3. 10| Add 25 μL of the surfactant solution and vortex. Critical step The surfactant solution must be applied only if the sample is placed in the ammonium bicarbonate buffer at an alkaline pH. At an acidic pH, the surfactant will be depredated, and the solution’s kinetic energy will be reduced prior to digestion, resulting in more missed cleavages and bigger peptide fragments. .?Troubleshooting 11| Place the tube in a dry bath set at 80 °C. Heat for 15 min. Critical step: Ensure the dry bath is set to the correct temperature before heating the sample. 12| Remove the tube from the dry bath. Perform a short spin; then add 2.5 μL of the reduction solution and vortex slightly. 13| Place the tube in a dry bath set at 60 °C and heat for 30 minutes. Critical step: Ensure the dry bath is set to the correct temperature before heating the sample. 14| Remove from the dry bath, allow the tube to cool to room temperature and then centrifuge it. Add 2.5 μL of the alkylation solution and vortex slightly. 15| Place the sample in the dark at room temperature and allow 30 minutes of reaction time. 16| Add 10 μL of the digestion solution and vortex slightly. Digest the sample at 37°C in a dry bath overnight. This produces a 1:100 wt:wt ratio of enzyme:protein. 17| Following digestion, to precipitate the surfactant, add 10 μL of hydrolysation solution and vortex. Then centrifuge the samples at 14000 rpm at 6 °C for 30 minutes. Transfer the supernatant to a Waters Total Recovery vial. Critical step The surfactant must be fully precipitated to ensure proper dissolution of the protein prior to injection in the chromatograph and to avoid contamination during MSE acquisition. Ensure the centrifugation step is well controlled to avoid the injection of precipitation residues into the nanoUPLC system. Troubleshooting. 18| Add 5 μL of ADH and then add 85 μL of the nanoLC-MSE solution. The final concentration of the protein is 250 ng.μL-1 and that of ADH is 25 fmol.μL-1. The final volume is 200 μL. Store at -80 °C up to 6 months. Critical step: Correctly pipetting these solutions is crucial for a good protein quantification by PLGS; therefore, it is critical to keep the counts/fmol stoichiometric ratio between the sum of the ion intensity and the concentration for a standard protein (manual response factor). It is desirable to use a manual response factor instead of the concentration amount of the internal standard protein for the best quantification analysis. NanoUPLC-MSE acquisition TIMING 1 d 19| The nanoACQUITY™ UPLC™ system was configured as follows: the samples were initially transferred with an aqueous 0.1% formic acid solution to trap the column with a flow rate of 15 μL.min-1 for 1 min with a 5 μL loop. CRITICAL STEP: To acquire data with the system, some considerations must be made upon installation and engineering the setup. The initial instrument setup is critical. For this purpose and for system qualification, 1 μg of the E. coli digestion standard was acquired during installation. The E. coli sample was spiked with rabbit phosphorylase B for a final concentration of 40 fmol.μL-1 on the column. The expected dynamic range was measured and the specifications were applied to reach a minimum of 2-3 orders of magnitude for the Synapt HDMS first generation mass spectrometer. After system qualification completion, the samples were left running in the MSE positive mode with a nano-electrospray source. 20| The peptides were separated with a gradient of 5–40 % mobile phase B over 90 min at a flow rate of 600 nL.min-1, followed by a 10 min rinse with 85% of mobile phase B. 21| The column was re-equilibrated at the initial conditions for 10 min. The column temperature was maintained at 35 °C. The lock mass was delivered from the auxiliary pump of the nanoACQUITY pump with a constant flow rate of 150 nL.min-1 at a concentration of 200 fmol of GFP solution (Sigma-Aldrich, USA) to the reference sprayer of the mass spectrometer NanoLockSpray™ source. ?Troubleshooting: The column diameter is critical to achieve the best resolving power and increase the peak capacity. For optimum loading for 75 μm inner diameter columns, consider using 250 to 500 ng of protein digest and 200 to 400 nL.min-1; for 100 μm columns, use 440 to 880 ng of digest and 400 to 600 nL.min-1; for 150 μm columns, use 1 to 2 μg of digest and 800 nL.min-1 to 1.2 uL.min-1; and for 300 μm columns, use 4 to 8 ug and 4 to 5 uL.min-1 with an analytical ESI source. If the analysis is with a common 2D SCX or 2D with dilution, the amount of sample injected can be multiplied by the fraction number to keep the column capacity at a maximum. 22| All samples were analysed in triplicate using a Synapt HDMS™ first generation mass spectrometer. For all measurements, the mass spectrometer operated in the “Vmode” of analysis with a typical resolving power of at least 10000 full-width halfmaximum (FWHM) and a sampling rate of 10 to 20 points across the chromatography peak to provide good quantification and peak representation into the chromatogram. 23| All analyses were performed using the positive nano-electrospray ion mode (nanoESI+). 24| The time-of-flight analyser of the mass spectrometer was externally calibrated with GFP b+ and y+ ions from m/z 50 to 1990 with the data post acquisition lock mass corrected using the GFP monoisotopic precursor ion of [M + 2H]2+ = 785.8426. 25| The reference sprayer was sampled with a frequency of 30 s. 26| The nanoUPLC-MSE data were collected in an alternating low energy and elevated energy mode of acquisition. The continuum spectra acquisition time in each mode was 1.5 s of scan time with at least 10 points per peak on the chromatogram. 27| In the low energy MS mode, the data were collected at a constant collision energy of 3 eV. 28| In the elevated energy MS mode, the collision energy was increased from 12 to 45 eV during each 1.5 s spectrum. 29| The radiofrequency applied to the quadrupole mass analyser was adjusted such that ions from m/z 50 to 2000 were efficiently transmitted. Data Processing and Protein Identification TIMING 1 d 30| The MS data obtained from the nanoUPLC-MSE were processed and searched using the ProteinLynxGlobalServer (PLGS) version 2.4v configured as follows. Sequences from Glycine max were downloaded from UniProt54. In PLGS, a new databank named “GLYCINE” was created, and the file containing amino acid sequences was appended. The protein identifications were obtained with the embedded ion accounting algorithm of the software and by searching the database with MassPREP™ Protein Digestion Standards (MPDS) inside as an UniProtKB/Swiss-Prot sequences (Phosphorylase P00489 - PHS2_RABIT, Bovine Hemoglobin - P02070 - HBB_BOVIN, ADH - P00330 - ADH1_YEAST, BSA - P02769 - ALBU_BOVIN) and a CTAG-P78358 protein appended to the database. CRITICAL STEP: The database must be correctly loaded into the PLGS. The identifications and quantitative data packaging were generated using dedicated algorithms42, 55 and searching against a species-specific database56. Refer to the software manual on how to proceed with the input method into the databank administration tool. ?Troubleshooting. 31| In PLGS, a new workflow was created for Electrospray-MSE analysis by setting the data bank to “GLYCINE” and setting the peptide and fragment tolerance to automatic. The minimum fragment ion matches per peptide was set to 3. The minimum fragment ion matches per protein was set to 7. The minimum peptide matches per protein was set to 1. The maximum protein mass was set to 600 kDa. Trypsin was chosen as the primary digest reagent, allowing 1 missed cleavage. Carbamidomethyl-C and the oxidation of M were set to fixed and variable modification, respectively. N-linked and O-linked options were set as variable glycosylation modification, the calibration protein was set to P00330 (corresponding to ADH sequence in database) and the calibration protein concentration was set to 25 fmol.uL-1. CRITICAL STEP: These configurations will determine the protein identification processes and may vary from sample to sample. Changes in specificity and selectivity can vary because the minimum fragment ion matches per peptide was set to 3 and can be as low as 1; the minimum fragment ion matches per protein was set to 7 and can be as low as 5; and the minimum peptide matches per protein was set to 1. The maximum protein mass was set to 600 kDa; if the EST database was used, this can be increased to at least 1000 kDa. For standard concentration assignments, it is preferable to use the manual response to keep the counts/fmol ratio within a minimum coefficient of variation (CV). 32| In PLGS, a new data preparation was created for Electrospray-MSE analysis by setting the chromatographic peak width and MS TOF resolution in automatic mode. The lock mass for charge 2 was set to m/z 785.8426 (corresponding to GFP mass), and the lock mass windows were set to ±0.25 Da. The low and elevated energy thresholds were set to 250.0 and 100.0 counts, respectively. The retention time windows were set to automatic, and 1500 counts were applied to the intensity threshold. CRITICAL STEP: Ensure the m/z value of GFP and the charge state set are correctly assigned to avoid error in the PLGS processing. Check the instrument calibration prior to analysis. If the interval window is more than 0.4 Da for GFP, calibrate the instrument. ?Troubleshooting 33| In PLGS, open a new project. Add 3 new original samples, named SOYCTAG L3, SOYCTAG L37, and SoyCN, which correspond to the lineage 3, 37 of the recombinant CTAG in soybean and non-transgenic soybean samples to be analysed and compared, respectively. If more samples need to be compared, add more original sample tags. 34| In PLGS, add a new microlitre plate named CTAG. For each sample, add the original raw data from the acquisition, the data preparation file and the workflow file to a vial position. After the files are combined, raw data processing is possible. Tables 2 and 3 indicate a typical result. CRITICAL STEP: Ion detection, clustering, and normalisation were performed in PLGS with ExpressionE software license installed (Waters, Manchester, UK). The intensity measurements are typically adjusted, i.e., deisotoped and charge state-reduced EMRTs that replicate throughout the complete experiment for analysis at the EMRT cluster level. The components are typically clustered together with a 10 ppm mass precision and a 0.25-min time tolerance or sufficient value to achieve at least 15 points per peak. The alignment of elevated energy ions with low energy precursor peptide ions is conducted with an approximate precision of 0.05 min. To analyse the protein identification and quantification level, the observed intensity measurements are normalised to the intensity measurement of the identified peptides of the digested internal standard, as described elsewhere56. 35| For expression analysis, add a new “expression analysis” in PLGS, placing the samples created in step 33 into separate groups. In the quantification analysis, use the normalisation in proteins, selecting ADH protein in the table. The results are shown in Fig. 4. Troubleshooting advice can be found in Table 3. ANTICIPATED RESULTS This is an easy-to-follow protocol to determine if a target recombinant protein has been expressed in any expression system, especially in a situation where a small sample must be used or no antibody is available to run blotting detection methods. We successfully detected the human growth hormone and coagulation factor IX proteins expressed in transgenic soybean lines9, 10 and present the preliminary results on the CTAG recombinant molecule expressed in the same system. Two lineages, SOYCTAG L3 and SOYCTAG L37, and a SOYBEAN Negative from the BR-16 cultivar were used as samples in this protocol. The amino acid sequence of CTAG can be observed in Fig. 2. Fig. 1 shows a diagram of the workflow. The sample preparation from TSP to the nanoUPLC procedure is critical for a successful identification. The use of high purity water and reagents is recommended due to the sensitivity of the technique. The low peptide dilution provided by nanoUPLC permits each compound to enter the mass spectrometer almost individually, allowing the production of MS and MSMS spectra from almost every peptide in the sample. When nanoACQUITY is associated with MSE acquisitions43, as the ion current is continuous and both MS and MS/MS are acquired in parallel, the chromatography peaks are sharpened as more points per peak are obtained, and there is high reproducibility between different injections, usually in the full loop method with 2 μL or 5 μL sample injection loading. Fig. 3 shows the resulting nanoUPLC chromatogram, MSE spectra from [M + 2H]+2 = 857.87 CTAG fragment and the respective processed spectra by PLGS. The orthogonal separations57 with the SCX columns58, 59 or recent technologies at the first-dimension linear gradient with fractions at different pH levels with high-resolution separations both in the first- and in the second dimensions52 are permitted due to the complexity of the chromatogram in this particular sample (Fig. 2A). To improve separation, this nanoUPLC system can be used with 2D RPxRP nanocolumns with small particles sizes at 1.7 μm for BEH or 1.8 μm for HSS T3 capillary column technologies that allow, for the first dimension, a highresolution separation with organic mobile phase pulse fractions with 20 mM ammonium formate at pH 10 with a 300 μm x 50 mm XBridge™ BEH 130 Å C18 5 μm column (Waters, Milford, MA) and a second dimension separation with a trap column followed by an analytical column of 75 μm X 100 mm at a low pH of 2.6. Even so, five peptides from CTAG (Table 2, Fig. 2) were detected with high selectivity and specificity. These peptides showed no trace of post translational modification, but the possibility cannot be discarded because another 6 CTAG peptides were not detected (Fig. 2). Additionally, a proteomic profile can be processed with absolute quantitative values for each protein (Table 1). In this example, the CTAG recombinant protein was detected and quantified in nanograms based on the stoichiometric ion intensity values of the minimum three prototypic peptides of ADH and the identified protein. A relation between the total detected protein and the specific protein concentration can be applied, allowing calculation of the percentage of the expressed protein in relation to TSP. The percentage of each detected protein can be observed in Table 1. CTAG has an expression value of 0.1%, which is low compared to that of the other transgenic soybean seeds expressing hGH9 (2.9%), but it has a similar value compared to factor IX expression (0.2%)10. Other soybean proteins, such as β-conglycinin and glycinin, have expected values mainly for storage proteins from soybean seeds60. Through this protocol, it is also possible to check the protein expression changes by comparing two or more samples. Fig. 4 shows a two-by-two comparison among SOYCTAG L3, SOYCTAG L37, and a SOYBEAN Negative protein expression list. It is possible to compare the expression level of the two transgenic lines and choose one with more recombinant protein production, in this case SOYCTAG L37. This technique, as with the IdentityE and ExpressionE software in PLGS (Waters, UK), can also be used to check higher and lower regulations of native proteins, providing information regarding the side effects of the introduction of transgenes at the proteomic level. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. Swartz, J.R. Advances in Escherichia coli production of therapeutic proteins. Curr. Opin. Biotechnol. 12, 195-201 (2001). Chu, L. & Robinson, D.K. Industrial choices for protein production by large-scale cell culture. Curr. Opin. Biotechnol. 12, 180-187 (2001). Tremblay, R., Wang, D., Jevnikar, A.M. & Ma, S. Tobacco, a highly efficient green bioreactor for production of therapeutic proteins. Biotechnol. Adv. 28, 214-221 (2010). Daniell, H., Singh, N.D., Mason, H. & Streatfield, S.J. Plant-made vaccine antigens and biopharmaceuticals. Trends Plant Sci. 14, 669-679 (2009). Boothe, J. et al. Seed-based expression systems for plant molecular farming. Plant Biotechnol. J. 8, 588–606 (2010). Cunha, N.B.d. et al. Correct targeting of proinsulin in protein storage vacuoles of transgenic soybean seeds. Genet. Mol. Res. 9, 1163-1170 (2010). Jolliffe, N.A., Craddock, C.P. & Frigerio, L. Pathways for protein transport to seed storage vacuoles. Biochem. Soc. Trans. 33, 1016-1018 (2005). Ma, J.K.-C., Drake, P.M.W. & Christou, P. The production of recombinant pharmaceutical proteins in plants. Nat. Rev. Genet. 4, 794-805 (2003). Cunha, N.B. et al. Expression of functional recombinant human growth hormone in transgenic soybean seeds. Transgenic Res. (2010). Cunha, N.B. et al. Accumulation of functional recombinant human coagulation factor IX in transgenic soybean seeds. Transgenic Res. (2010). Rech, E.L., Vianna, G.R. & Aragão, F.J.L. High-efficiency transformation by biolistics of soybean, common bean and cotton transgenic plants. Nat. Protoc. 3, 410-418 (2008). Blas, A.L.D. & Cherwinski, H.M. Detection of antigens on nitrocellulose paper immunoblots with monoclonal antibodies. Anal. Biochem. 133, 214-219 (1983). Perlmann, P. & Engvall, E. Enzyme-linked immunosorbent assay (ELISA). Quantitative assay of immunoglobulin G. Immunochemistry 8, 871-874 (1971). O'Farrells, P.H. High resolution two-dimensional electrophoresis of proteins. J. Biol. Chem. 250, 4007-4021 (1975). Shevchenko, A., Tomas, H., Havlis, J., Olsen, J.V. & Mann, M. In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat. Protoc. 1, 28562860 (2006). Weiss, W. & Görg, A. Two-dimensional electrophoresis for plant proteomics. Methods Mol. Biol. 355, 121-143 (2007). Blackstock, W.P. & Weir, M.P. Proteomics: quantitative and physical mapping of cellular proteins. Trends Biotechnol. 17, 121-127 (1999). Murad, A.M. et al. Screening of entomopathogenic Metarhizium anisopliae isolates and proteomic analysis of secretion synthesized in response to cowpea weevil (Callosobruchus maculatus) exoskeleton. Comp. Biochem. Physiol., C 142, 365-370 (2006). Murad, A.M. et al. Proteomic analysis of Metarhizium anisopliae secretion in the presence of the insect pest Callosobruchus maculatus. Microbiology 154, 3766–3774 (2008). Halligan, B.D. ProMoST: A tool for calculating the pI and molecular mass of phosphorylated and modified proteins on 2 dimensional gels. Methods Mol. Biol. 527, 283-298 (2009). Henzel, W.J. et al. Identifying proteins from two-dimensional gels by molecular mass searching of peptide fragments in protein sequence databases. Proc. Natl. Acad. Sci. U. S. A. 90, 5011-5015 (1993). Wilson, N., Simpson, R. & Cooper-Liddell, C. Introductory glycosylation analysis using SDS-PAGE and peptide mass fingerprinting. Methods Mol. Biol. 534, 205-212 (2009). 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. Gevaert, K. et al. Exploring proteomes and analyzing protein processing by mass spectrometric identification of sorted N-terminal peptides. Nat. Biotechnol. 21, 566-569 (2003). Hunter, A.P. & Games, D.E. Chromatographic and mass spectrometric methods for the identification of phosphorylation sites in phosphoproteins. Rapid Commun. Mass. Spectrom. 8, 559-570 (1994). Wilkins, J.A., Xiang, R. & Horváth, C. Selective enrichment of low-abundance peptides in complex mixtures by elution-modified displacement chromatography and their identification by electrospray ionization mass spectrometry. Anal. Chem. 74, 3933-3941 (2002). Husson, S.J. et al. Comparative peptidomics of Caenorhabditis elegans versus C. briggsae by LC–MALDI-TOF MS. Peptides 30, 449-457 (2009). Guerrier, L. & Boschetti, E. Protocol for the purification of proteins from biological extracts for identification by mass spectrometry. Nat. Protoc. 2, 832-837 (2007). Guerrier, L., Righetti, P.G. & Boschetti, E. Reduction of dynamic protein concentration range of biological extracts for the discovery of low-abundance proteins by means of hexapeptide ligand library. Nat. Protoc. 3, 883-890 (2008). Deterding, L.J., Moseley, M.A., Tomer, K.B. & Jorgenson, J.W. Nanoscale separations combined with tandem mass spectrometry. J. Chromatogr. A 554, 73-82 (1991). Shen, Y. et al. High-efficiency nanoscale liquid chromatography coupled on-line with mass spectrometry using nanoelectrospray ionization for proteomics. Anal. Chem. 74, 4235-4249 (2002). Mirgorodskaya, E., Braeuer, C., Fucini, P., Lehrach, H. & Gobom, J. Nanoflow liquid chromatography coupled to matrixassisted laser desorption/ionization mass spectrometry: Sample preparation, data analysis, and application to the analysis of complex peptide mixtures. Proteomics 5, 399–408 (2005). Nouri, M.-Z. & Komatsu, S. Comparative analysis of soybean plasma membrane proteins under osmotic stress using gel-based and LC MS/MS-based proteomics approaches. Proteomics 10, 1930-1945 (2010). Brumbarova, T., Matros, A., Mock, H.-P. & Bauer, P. A proteomic study showing differential regulation of stress, redox regulation and peroxidase proteins by iron supply and the transcription factor FER. Plant J. 54, 321-334 (2008). Behrens, H.L., Chen, R. & Li, L. Combining microdialysis, NanoLC-MS, and MALDITOF/TOF to detect neuropeptides secreted in the crab, Cancer borealis. Anal. Chem. 80, 6949–6958 (2008). Zybailov, B. et al. Sorting signals, N-terminal modifications and abundance of the chloroplast proteome. PLoS one 3, e1994 (2008). Barnes, S. et al. High-resolution mass spectrometry analysis of protein oxidations and resultant loss of function. Biochem. Soc. Trans. 36, 1037-1044 (2008). Bouché, J.-P. et al. NanoLC-MS/MS analysis provides new insights into the phosphorylation pattern of Cdc25B in vivo: full overlap with sites of phosphorylation by Chk1 and Cdk1/cycB kinases in vitro. J. Proteome Res. 7, 1264-1273 (2008). Unwin, R.D., Griffiths, J.R. & Whetton, A.D. A sensitive mass spectrometric method for hypothesis-driven detection of peptide post-translational modifications: multiple reaction monitoring-initiated detection and sequencing (MIDAS). Nat. Protoc. 4, 870877 (2009). Mori, M. et al. Production of 18O-single jabeled peptide fragments during trypsin digestion of proteins for quantitative proteomics using nanoLC−ESI−MS/MS. J. Proteome Res. 9, 3741–3749 (2010). Yang, Y. et al. A comparison of nLC-ESI-MS/MS and nLC-MALDI-MS/MS for GeLC-based protein identification and iTRAQ-based shotgun quantitative proteomics. J. Biomol. Tech. 18, 226-237 (2007). Levin, Y. et al. Real-time evaluation of experimental variation in large-scale LC– MS/MS-based quantitative proteomics of complex samples. J. Chromatogr. B 877, 1299-1305 (2009). 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. Li, G.-Z. et al. Database searching and accounting of multiplexed precursor and product ion spectra from the data independent analysis of simple and complex peptide mixtures. Proteomics 9, 1696–1719 (2009). Geromanos, S.J. et al. The detection, correlation, and comparison of peptide precursor and product ions from data independent LC-MS with data dependant LC-MS/MS. Proteomics 9, 1683–1695 (2009). Xu, D. et al. Novel MMP-9 Substrates in Cancer Cells Revealed by a Label-free Quantitative Proteomics Approach. Mol. Cell Proteomics 7, 2215-2228 (2008). Cheng, F.-y., Blackburn, K., Lin, Y.-m., Goshe, M.B. & Williamson, J.D. Absolute protein quantification by LC/MSE for global analysis of salicylic acid-induced plant protein secretion responses. J. Proteome Res. 8, 82–93 (2009). Gnjatic, S. et al. NY-ESO-1: Review of an Immunogenic Tumor Antigen. Adv. Cancer Res. 95, 1-30 (2006). Chen, Y. et al. A testicular antigen aberrantly expressed in human cancers detected by autologous antibody screening. Proc. Natl. Acad. Sci. U. S. A. 94, 1914-1918 (1997). Kurashige, T. et al. NY-ESO-1 expression and immunogenicity associated with transitional cell carcinoma: correlation with tumor grade. Cancer Res. 61, 4671-4674 (2001). Murphy, R. et al. Recombinant NY-ESO-1 cancer antigen: production and purification under cGMP conditions. Prep. Biochem. Biotechnol. 35, 119-134 (2005). Liu, H. et al. Effects of column length, particle size, gradient length and flow rate on peak capacity of nano-scale liquid chromatography for peptide separations. J. Chromatogr. A 1147, 30-36 (2007). Liu, H., Finch, J.W., Luongo, J.A., Li, G.-Z. & Gebler, J.C. Development of an online two-dimensional nano-scale liquid chromatography/mass spectrometry method for improved chromatographic performance and hydrophobic peptide recovery. J. Chromatogr. A 1135, 43-51 (2006). Gilar, M., Olivova, P., Daly, A.E. & Gebler, J.C. Two-dimensional separation of peptides using RP-RP-HPLC system with different pH in first and second separation dimensions. J. Sep. Sci. 28, 1694–1703 (2005). Yu, Y.-Q., Gilar, M., Lee, P.J., Bouvier, E.S.P. & Gebler, J.C. Enzyme-friendly, mass spectrometry-compatible surfactant for in-solution enzymatic digestion of proteins. Anal. Chem. 75, 6023-6028 (2003). Consortium, T.U. The Universal Protein Resource (UniProt) in 2010. Nucleic Acids Res. 38, D142-D148 (2010). Silva, J.C. et al. Quantitative proteomic analysis by accurate mass retention time pairs. Anal. Chem. 77, 2187-2200 (2005). Silva, J.C., Gorenstein, M.V., Li, G.-Z., Vissers, J.P.C. & Geromanos, S.J. Absolute quantification of proteins by LCMSE: a virtue of parallel MS acquisition. Mol. Cell Proteomics 5, 144-156 (2005). Gilar, M., Olivova, P., Daly, A.E. & Gebler, J.C. Orthogonality of separation in twodimensional liquid chromatography. Anal. Chem. 77, 6426–6434 (2005). Millea, K.M. et al. Evaluation of multidimensional (ion-exchange/reversed-phase) protein separations using linear and step gradients in the first dimension. J. Chromatogr. A 1079, 287-298 (2005). Gilar, M. et al. Comparison of 1-D and 2-D LC MS/MS methods for proteomic analysis of human serum. Electrophoresis 30, 1157–1167 (2009). Li, C. & Zhang, Y.-M. Molecular evolution of glycinin and β-conglycinin gene families in soybean (Glycine max L. Merr.). Heredity doi 10.1038/hdy.2010.97 (2010). Acknowledgements We are grateful to G. Ritter at Ludwig Cancer Research Institute (New York Branch) for providing genes and antibodies. We acknowledge support from C. Bloch at the Mass Spectrometry Laboratory-EMBRAPA. We acknowledge discussions with G. Ritter and C. Bloch and thank J. Taquita for technical help. This work was supported by Brazilian Agricultural Research Corporation, National Council for Scientific and Technological Development and Fundacao de Apoio a Pesquisa-DF. Table 1 | List of identified proteins by PLGS in the CTAG soybean transgenic line. Entry Description P78358 O22120 C6T488 P04776 P19594 Q549Z4 P04405 C6TKH0 B3TDK4 P08170 P01063 P01064 P24337 Q39805 Q7GC77 Q852U4 Q852U5 P05046 C6T9Z5 C6TDF5 Q9SEK9 Q9SEK8 Q9XET1 Q9SEL0 Q9XER5 Cancer testis antigen 1 α-subunit of β-conglycinin Fragment OS Glycine max PE 2 SV 2 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Glycinin G1 OS Glycine max GN GY1 PE 1 SV 2 2S albumin OS Glycine max PE 1 SV 2 Proglycinin A2B1 OS Glycine max PE 2 SV 1 Glycinin G2 OS Glycine max GN Gy2 PE 1 SV 2 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Lipoxygenase OS Glycine max PE 3 SV 1 Seed lipoxygenase 1 OS Glycine max GN LOX1 1 PE 1 SV 2 Bowman Birk type proteinase inhibitor C II OS Glycine max PE 1 SV 2 Bowman Birk type proteinase inhibitor D II OS Glycine max PE 1 SV 2 Hydrophobic seed protein OS Glycine max PE 1 SV 1 Dehydrin-like protein OS Glycine max PE 2 SV 1 Glycinin A3B4 subunit OS Glycine max PE 1 SV 1 Glycinin A1bB2 784 OS Glycine max PE 2 SV 1 Glycinin A1bB2 445 OS Glycine max PE 2 SV 1 Lectin OS Glycine max GN LE1 PE 1 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Seed maturation protein PM25 OS Glycine max GN PM25 PE 2 SV 1 Seed maturation protein PM26 OS Glycine max GN PM26 PE 2 SV 1 Seed maturation protein PM31 OS Glycine max GN PM31 PE 2 SV 1 Seed maturation protein PM24 OS Glycine max GN PM24 PE 2 SV 1 Seed maturation protein PM22 OS Glycine max GN PM22 PE 2 SV 1 mW pI (pH) PLGS Amount (Da) Score (ng) 17981 8.4739 2886.386 0.2635 63126 4.7254 51090.21 65.47 24103 5.1341 49332.05 0 55671 5.8257 34638.61 28.6832 18447 5.0153 26866.75 6.9832 54356 5.2983 26163.09 9.1288 54356 5.2983 26155.1 16.8381 31640 6.4124 25943.07 4.4002 94352 5.8755 23234.38 11.2248 94310 5.9301 22866.77 0 9194 4.3797 19673.12 2.7779 9460 4.6657 18789.71 1.434 8353 6.0467 17254.79 0.5404 23703 6.084 16428.98 5.4437 58151 5.4199 14016.68 0.3179 54264 5.9489 12395.54 0.6519 54183 5.7768 12393.59 0.4914 30908 5.5955 11981.52 8.6499 42796 6.2935 11699.31 1.0956 41854 6.9985 11311.33 0.5088 25713 4.7899 9964.625 1.0382 26087 4.63 9770.589 0.9765 17735 6.104 9168.433 1.5019 26824 4.9752 8024.353 0.8781 16677 4.9629 7963.376 0.6137 % of TSP 0.11445 28.43671 0 12.45847 3.033133 3.965068 7.31358 1.911214 4.87546 0 1.206573 0.622854 0.234721 2.364455 0.138079 0.283151 0.213438 3.757059 0.475871 0.220996 0.450939 0.42414 0.652346 0.3814 0.266559 Q9LLQ6 C6T1Q7 C6T588 Q9AVK8 Q2I0H4 Seed maturation protein PM34 OS Glycine max GN PM34 PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Allergen Gly m Bd 28K Fragment OS Glycine max PE 2 SV 1 Glyceraldehyde 3 phosphate dehydrogenase OS Glycine max GN GAPC1 PE 2 SV 1 Q9XET0 Putative uncharacterised protein OS Glycine max GN PM30 PE 2 SV 1 C6TBB3 Putative uncharacterised protein OS Glycine max PE 4 SV 1 P93165 Em protein OS Glycine max PE 4 SV 1 Q04672 Sucrose binding protein OS Glycine max GN SBP PE 1 SV 1 C6SVM2 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Q07CZ3 Glyceraldehyde 3 dehydrogenase C subunit OS Glycine max PE 2 SV 1 C6SWV3 Putative uncharacterised protein OS Glycine max PE 2 SV 1 C6TB70 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Q9XES8 Seed maturation protein PM28 OS Glycine max GN PM28 PE 4 SV 1 C6T0L2 Putative uncharacterised protein OS Glycine max PE 4 SV 1 Q38IW8 Triosephosphate isomerase OS Glycine max PE 2 SV 1 Q9SWB2 Seed maturation protein PM41 OS Glycine max GN PM41 PE 4 SV 1 Q42795 β-amylase OS Glycine max PE 1 SV 1 Q39871 Late embryogenesis abundant protein OS Glycine max GN MP2 PE 2 SV 1 C6T0B5 Putative uncharacterised protein OS Glycine max PE 2 SV 1 C6SVR5 Putative uncharacterised protein OS Glycine max PE 2 SV 1 P00330 ALCOHOL DEHYDROGENASE I EC 1 1 1 1 C6SZ11 Putative uncharacterised protein OS Glycine max PE 2 SV 1 O64458 Gly m Bd 30K allergen OS Glycine max GN P34 PE 2 SV 1 C6TD82 Putative uncharacterised protein OS Glycine max PE 2 SV 1 C6TCF1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 C6TB67 Putative uncharacterised protein OS Glycine max PE 2 SV 1 C6EVF9 Elongation factor 1 α-OS Glycine max GN EF 1A PE 2 SV 1 31746 17812 16750 52608 36741 6.6812 5.9577 4.5317 5.6576 6.8421 7863.564 7729.998 7076.108 5503.088 5307.646 0.43 1.3127 0.6071 2.8615 1.4628 0.186769 0.570167 0.263692 1.242884 0.635363 15088 12337 11484 60484 17367 36701 27618 24404 9506 11134 27187 8172 56036 50613 13998 23888 36668 27031 42730 31058 27781 22971 49365 9.4202 5.3837 5.3518 6.4228 9.468 6.8421 5.6695 6.5098 4.4641 6.3754 5.8176 4.6642 5.1887 6.2924 5.697 5.636 6.2734 6.4528 5.5616 7.5082 5.0962 7.7157 9.2369 5145.053 4899.041 4895.396 4608.476 4510.81 4077.103 3894.106 3485.424 3193.133 3078.308 2909.143 2896.554 2892.254 2760.776 2535.701 2489.419 2351.86 2284.277 2154.713 2149.048 2055.682 1790.652 1653.604 0.761 0.1225 0.0814 3.3696 0.5253 0 0.7139 0.7356 0.1892 1.1507 0.2015 0.2108 2.0148 4.4162 0.4396 0.4216 0.9173 0.5469 3.4426 0.0461 0.3383 0.3387 1.3382 0.330538 0.053208 0.035356 1.463576 0.228163 0 0.31008 0.319506 0.082178 0.499803 0.087521 0.09156 0.875123 1.918164 0.190939 0.183121 0.398427 0.237544 1.495283 0.020023 0.14694 0.147113 0.581243 C6T072 C6SWE0 P26413 Q6RIB6 C6TK76 C6TGM9 A1KR24 C6T920 Q84V19 Q9SP11 Q96450 C6T9C2 Q71EW8 C6K8D1 C6SZX7 C6T1V2 C6TNU2 P27066 C6TB98 C6T8D8 C6T4R9 C6SZN7 C6TLT3 C6TMG1 C6T049 Q6RIB7 C6T4Z6 C6SVT0 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Heat shock 70 kDa protein OS Glycine max GN HSP70 PE 3 SV 1 Malate dehydrogenase OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Dehydrin OS Glycine max GN LEA 2 D11 PE 3 SV 1 Phosphoglycerate kinase Fragment OS Glycine max PE 2 SV 1 Sucrose binding protein 2 OS Glycine max GN SBP2 PE 2 SV 1 Sucrose binding protein homolog S 64 OS Glycine max GN SBP PE 2 SV 1 14 3 3-like protein A OS Glycine max GN GF14A PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Methionine synthase OS Glycine max PE 2 SV 1 Seed biotinylated protein 68 kDa isoform OS Glycine max PE 2 SV 1 Glutathione peroxidase OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Ribulose bisphosphate carboxylase large chain OS Glycine max GN rbcL PE 1 SV 3 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Fructose bisphosphate aldolase Fragment OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 4 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Fructose bisphosphate aldolase OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Enolase OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 17442 17355 70835 35504 41507 22317 25369 25296 55740 55799 29030 34556 84229 67906 18491 17729 47497 52576 5.2698 5.2698 5.1815 6.3424 5.1583 5.9214 6.1198 9.7126 6.1009 6.316 4.4978 5.8074 5.8874 6.1461 6.9435 6.3926 5.6664 5.976 1651.261 1615.874 1481.25 1466.792 1249.91 1194.818 1148.557 1114.197 1069.194 1005.86 910.0024 842.0455 833.6119 781.9993 737.6782 727.8019 684.3593 675.6335 0.3113 0.1385 1.7718 0.4538 0.1395 0.1194 0.8074 0.2452 0.3031 0.2359 0.4026 0.8502 1.3438 6.5867 0.1674 0.189 0.0516 0.5484 0.135212 0.060157 0.769576 0.197107 0.060591 0.051861 0.350692 0.106502 0.131651 0.102462 0.174868 0.369282 0.583676 2.860914 0.07271 0.082092 0.022412 0.238196 33906 5.5009 664.4331 28937 7.1175 633.6357 17656 10.1526 584.8538 12980 5.1436 576.5699 29708 10.2354 557.4932 38315 7.3405 534.5306 17988 5.3859 529.731 47689 5.1445 518.4756 15883 10.5421 502.4344 18011 6.9304 460.7158 0.3868 0.3172 0.3291 0.1605 0.0924 0.3023 0.1186 0.5378 0.0449 0.0918 0.168005 0.137775 0.142944 0.069713 0.040134 0.131303 0.051514 0.233592 0.019502 0.039873 Q39839 C6SYU0 C6SZN6 C6TG05 C6K8D0 C6T1R3 Q9SPB8 Q0GJJ9 Q9SWB4 C6TNI8 P29530 C6SXU0 C6SW79 P28551 C6T7U2 Q39801 C6SVF1 C6TGA6 C6SXS9 Q8RVH5 C6TBB8 B1Q2X4 B0M1A9 B1ACD5 C6TNU3 P54774 C6T9X5 C6SXR4 C6TCR6 Nucleoside diphosphate kinase 1 OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Trypsin inhibitor 26 kDa isoform OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Malate dehydrogenase OS Glycine max GN Mdh1 PE 3 SV 1 ACP thioesterase protein Fragment OS Glycine max GN FATB1b PE 4 SV 1 Poly ADP ribose polymerase 3 OS Glycine max GN PARP3 PE 2 SV 1 Putative uncharacterised protein Fragment OS Glycine max PE 2 SV 1 P24 oleosin isoform A OS Glycine max PE 2 SV 2 Putative uncharacterised protein OS Glycine max PE 2 SV 1 40S ribosomal protein S12 OS Glycine max PE 2 SV 1 Tubulin β-chain Fragment OS Glycine max GN TUBB PE 2 SV 2 Putative uncharacterised protein OS Glycine max PE 2 SV 1 51 kDa seed maturation protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Basic 7S globulin 2 OS Glycine max PE 1 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Protein disulfide isomerase OS Glycine max GN PDIL 1 PE 3 SV 1 Peroxisomal 3 ketoacyl CoA thiolase OS Glycine max PE 2 SV 1 Kunitz trypsin protease inhibitor OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Cell division cycle protein 48 homolog OS Glycine max GN CDC48 PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 16432 5.8828 18273 8.6237 17935 5.5091 40355 6.5206 25930 8.4036 25133 5.9641 36119 8.2277 22492 5.4897 91630 5.2626 20580 5.5631 23487 9.0505 27721 6.8196 14788 5.176 45721 5.553 51462 5.2559 50951 6.7427 15968 6.1593 34167 4.8087 23473 5.7777 47174 8.174 13300 8.9643 58554 4.9532 48585 7.7 22661 5.0623 37948 5.8645 89713 5.0054 43553 5.7779 14832 11.3069 36175 4.6644 458.5279 422.0796 386.4056 381.6409 377.5146 362.5658 330.7828 318.6968 316.8825 313.7788 298.6236 295.3087 249.5643 246.5856 242.4084 239.2772 237.9553 227.2209 219.1115 205.5323 190.4985 190.1399 189.2588 177.8031 174.8052 169.9282 169.9056 166.6849 163.951 0 0.1511 0.0485 0.322 0.2736 0.1876 0.2051 0.8157 0.8352 0.3245 0.3017 0.2351 0.0961 0.0953 0.5776 0.2423 0.0405 0.0914 0.1407 0 0.1962 0.5267 0.2631 0 0.1576 0.8228 0.226 0.0521 0.1824 0 0.06563 0.021066 0.13986 0.118837 0.081484 0.089085 0.354297 0.362767 0.140946 0.131043 0.102115 0.041741 0.041393 0.250879 0.105242 0.017591 0.039699 0.061113 0 0.085219 0.228771 0.114277 0 0.068453 0.357381 0.098162 0.022629 0.079225 C6T262 C6TN03 C6TL46 A4ZGT5 Q9SPJ6 C6TG88 C6ZRP9 Q7XAC5 C6TJD3 O22518 C6TGJ9 Q0PJB9 Q8L7J4 C6T520 C6T6B2 C6SY64 C6T470 B0M1A8 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Transcription factor bZIP129 Fragment OS Glycine max GN bZIP129 PE 2 SV 1 Maturation protein pPM32 OS Glycine max GN PM32 PE 2 SV 1 Putative uncharacterised protein Fragment OS Glycine max PE 2 SV 1 Pti1 kinase-like protein OS Glycine max PE 2 SV 1 Embryo specific urease OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 40S ribosomal protein SA OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 MYB transcription factor MYB131 Fragment OS Glycine max GN MYB131 PE 2 SV 1 Pyruvate kinase OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Proteasome subunit β-type OS Glycine max PE 2 SV 1 Putative uncharacterised protein OS Glycine max PE 2 SV 1 Peroxisomal aminotransferase Fragment OS Glycine max PE 2 SV 1 22373 7.2902 163.3037 29982 10.6386 144.9159 29582 5.7539 142.5109 20603 9.6048 142.3783 18871 5.3156 135.9585 17907 4.584 131.2757 34932 8.9379 130.005 90099 5.6169 128.19 35726 7.6161 122.8045 33885 4.9052 122.2737 28237 11.1379 117.5363 36138 9.0851 111.3419 0.317 0.6681 0.7757 0.784 0.2404 1.3555 0.1044 0.4533 0.1528 0.2313 0.042 0.7036 0.137688 0.290187 0.336923 0.340528 0.104417 0.588758 0.045346 0.19689 0.066368 0.100464 0.018243 0.305607 55280 18088 19918 24533 27220 31458 0.2255 0.206 0.1522 0 0.0752 0.3781 0.097945 0.089476 0.066108 0 0.032663 0.164227 7.0847 5.8 5.1687 7.0461 4.5276 6.087 109.9843 109.3591 104.2455 103.5601 101.0626 100.999 Table 2 | List of peptide sequences found by PLGS for CTAG protein. Precursor MH+ Charge MH+ Error Score Start End Sequence Modifications (Da) state (Da) 1715.0154 2.04 0.0032 7.8859 108 124 (R)SLAQDAPPLPVPGVLLK(E) 1349.7391 2 0.0019 7.2048 125 136 (K)EFTVSGNILTIR(L) 1871.1141 3 -0.0038 7.0925 107 124 (R)RSLAQDAPPLPVPGVLLK(E) 2485.308 2.71 0.0177 6.1839 87 107 (R)LLEFYLAMPFATPMEAELARR(S) Oxidation M (8) 2855.4158 4 0.0052 6.0799 82 106 (R)GPESRLLEFYLAMPFATPMEAELAR(R) Oxidation M (13) Retention Intensity Time (min) 110.3133 19656 94.5072 7191 102.1798 2493 141.8096 2994 63.4806 2744 Table 3 | Troubleshooting table. Problem Recommendations A contaminant with a repetitive cluster with singly charged ions encountered during chromatography. Use only high quality pipette tips and tubing. Poor quality plastics release quantities of compounds into the sample that will affect chromatography and MS analysis. Poor peptide profile Digest a new sample with a recently prepared high-quality trypsin. Check the pH of the sample before adding surfactant; it must be alkaline. After MSE acquisition, PLGS processes stop with message “failed to process raw data” or resulted in insufficient data. This indicates a problem in the MS acquisition. Check the ionisation source, the changing and cleaning probe and cone; check also that the GFP solution is adequately delivered by the lock mass. Look into the raw data. High pressure during chromatography stops the acquisition. The column or capillary has clogged. Replace the column and capillary and ensure that the sample is digested and correctly centrifuged. PLGS does not quantify the sample. Ensure that ADH was added to the sample and that the information was given in the workflow process of PLGS. Contamination appears during chromatography Check all solutions. Use only MS and HPLC reagents and deionised water with total organic compounds less than 4 ppb to avoid contamination. PLGS does not process the database Introducing the database into PLGS requires that the sequences are in a FASTA format with the same strings and character patterns. Low reproducibility due to column saturation Keep the total protein mass load into column depending on the column diameter: 75 μm for 250 μg to 500 μg, 100 μm for 440 μg to 880 μg, 150 μm for 1 μg to 2 μg and 300 μm for 4 μg to 8 μg. PLGS does not show results If no result is displayed, check the log files or LockMass m/z window with no errors. Check the data preparation file for errors in the LockMass values. Fig 1 | Protocol workflow. A total soluble protein from the soybean transgenic line expressing the CTAG molecule was digested with trypsin and submitted to nanoUPLCQ-TOF MSE analysis. Fig 2 | CTAG amino acid sequence. Coloured boxes indicate the peptides found by PLGS (Table 2), and the overlapping regions are indicated by changes in colour. Fig 3 | Experimental spectra results. (A) Chromatogram of the nanoUPLC of soybean CTAG lineage 3. The arrows indicate the eluted peptides corresponding to the CTAG digested protein (Table 2). (B) MS spectra from 110.31 min containing the [M + 2H] +2 = 857.87 ion from the CTAG protein fragment. (C) MSMS spectra from [M + 2H]+2 = 857.87 ion precursor. (D) Deconvoluted MSMS spectra processed by PLGS and the de novo sequence corresponding to ion [M+H]+=1715.01 from the trypsin digestion of CTAG. Fig 4 | The expression analysis between samples. CTAG L37 and CTAG L3 correspond to the transgenic soybean lines and SOY CN corresponds to the negative soybean seed. Red numbers indicate the down regulation ratio of the selected protein; green numbers correspond to the up regulation; and gray indicates no modification of the expression level. The Log of the ratio and its standard deviation are shown in parentheses. The P value corresponding to values from 0 to 1, where 0-0.05 is considered down-regulated and 0.95-1.00 is considered up-regulated, is shown in brackets.