Textfile Format for Quantitative Data General Notes To incorporate quantitative data in PRIDE XML files the new PRIDE Converter supports quantitative data in the here presented text based format. This data is then incorporated in PRIDE XML files using the new standard for reporting quantitative data. File Format The file format is a simple, tab-delimited file and consists of three sections, each separated by an empty line: 1.) The metadata section, 2.) the protein table and 3.) the (optional) peptide table. General Note: Since the page isn’t wide enough to display all the lines, lines that brake within the actual line are “concatenated” by “…”. Short Summary quantification_method: iTRAQ quantification_level: ENUM(protein, peptide) intensity_measurement: ENUM(absolute, relative) subsample[1-n]_description: Human dendritic cells untreated subsample[1-n]_reagent: PRIDE [tab] PRIDE:0000116 [tab] iTRAQ reagent 116 subsample_cvparam: NEWT [tab] 9606 [tab] Homo sapiens (Human) [tab] … subsample[1-n] protein_accession [tab] protein_abundance_subsample[1-n] [tab] … protein_intensity_subsample[1-n] [tab] protein_relative_subsample[1- … n]_subsample[1-n] [tab] P12345 28 0.3 7.5 protein_accession [tab] peptide_sequence [tab] unique_identifier [tab] … peptide_intensity_subsample[1-n] [tab] peptide_relative_subsample[1- … n]_subsample[1-n] [tab] P12345 ABCD 12_54 30 0.4 10 P12345 CDE 73_1 25 0.1 5 Description Header As in the gel-based data format the header basically consists of key value pairs: Field Name Type Mult Description quantification_method String 1..1 The quantification method used. Allowed values are: iTRAQ quantification_level String 1..1 SILAC ICAT … (list not yet complete) Describes whether the reported method quantifies the data at the protein or at the peptide level. Allowed values are: intensity_measurement String 1..1 protein peptide Describes whether absolute or relative intensities are being reported. Allowed values are: subsample[1-n]_description String 1..n absolute relative A human readable description for every subsample. The input from this field will be reported as value in a cvParam. subsample[1-n]_reagent cvParam 1..n A cvParam to describe the reagent used to label this subsample. subsample_cvparam cvParam 0..n Additional cvParams describing the subsample. For example, the species as NEWT parameter or the tissue as BRENDA parameter. The subsample is represented through the cvParams value which must be in the format subsample[1-n]. Protein Table The protein table reports quantitative information at the protein level. This table should only contain the columns available in the used quantitative method. For example, in iTRAQ experiments where no direct protein intensities are being measured, the column type protein_intensity[1-n] should not be used. The order of the columns in this table is flexible. Field Name Type Description protein_accession String The accession identifying the protein. protein_abundance_subsample[ Double The abundance should be used to report the 1-n] calculated protein abundance when a peptide centric quantitation method is being used. As already discussed, current quantitation methods generally generate ambiguous results. Thus, human interpretation is required to deduct quantitative information at the protein level. protein_intensity_subsample[1- Double This column should be used to report n] quantitative information that was generated at the protein level (f.e. in gel-based approaches). protein_relative_subsample[1- Double This field should be used to report relative n]_subsample[1-n] protein abundance between two subsamples. Peptide Table The (optional) second table will be used to report quantitative data that was generated at the peptide level. Field Name Type Description protein_accession String The peptide’s protein’s accession. peptide_sequence String The peptide’s sequence. unique_identifier String This is the exact same identifier used by the DAO and required to distinctly link the entry to a specific peptide (rather spectrum). Even though the developer's (of other tools) don't have the DAO unique_identifier at the time of the file creation it will for most search engines (Mascot, X!Tandem and SpectrumMill for sure) be possible to retrieve this info based on our DAO documentation. Mascot f.e. already exports the quantitative information + the fields required for the unique identifier in their csv format. A list of unique identifiers per DAO can be found at the end of this section. peptide_intensity_subsample Double The peptide's intensity for the respective [1-n] subsample. peptide_relative_subsample[ Double The peptide's relative intensity comparing one 1-n]_subsample[1-n] subsample against the other. F.e. peptide_relative_subsample1_subsample3: The relative intensity comparing subsample1 against subsample3. Unique Identifiers Search Engine Format Mascot Unique Identifier [query id]_[rank] The unique identifier is build from the query’s (= the spectrum’s) id and the identification rank. X!Tandem [protein id] The X! Tandem DAO uses the domain’s id (1000.2.1) as unique identifier. Examples An example how to report f.e. gel based quantitation where three biological samples were multiplexed. quantification_method: SEP [tab] SEP:00180 [tab] difference gel electrophoresis quantification_level: protein intensity_measurement: absolute subsample1_description: untreated human dendritic cells subsample2_description: Il-1 beta stimulated human dendritic cells subsample3_description: TNF-alpha stimulated human dendritic cells subsample1_reagent: cy3 subsample2_reagent: cy5 subsample3_reagent: cy7 subsample_cvparam: NEWT [tab] 9606 [tab] Homo sapiens (Human) [tab] subsample1 subsample_cvparam: NEWT [tab] 9606 [tab] Homo sapiens (Human) [tab] subsample2 subsample_cvparam: NEWT [tab] 9606 [tab] Homo sapiens (Human) [tab] subsample3 subsample_cvparam: CL [tab] CL:0000451 [tab] dendritic cell [tab] subsample1 subsample_cvparam: CL [tab] CL:0000451 [tab] dendritic cell [tab] subsample2 subsample_cvparam: CL [tab] CL:0000451 [tab] dendritic cell [tab] subsample3 protein_accession [tab] protein_intensity_subsample1 [tab] … protein_intensity_subsample2 [tab] protein_intensity_subsample3 P12345 [tab] 18 [tab] 21 [tab] 0 P12346 [tab] 3 [tab] 29 [tab] 1002 P12347 [tab] 900 [tab] 29 [tab] 3 An iTRAQ example with three biological samples: quantification_method: iTRAQ quantification_level: peptide intensity_measurement: absolute subsample1_description: untreated human dendritic cells subsample2_description: Il-1 beta stimulated human dendritic cells subsample3_description: TNF-alpha stimulated human dendritic cells ... protein_accession [tab] protein_abundance_subsample1 [tab] … protein_abundance_subsample2 [tab] protein_abundance_subsample3 P12345 [tab] 15 [tab] 4 [tab] 11 P12346 [tab] 13 [tab] 36 [tab] 51 protein_accession [tab] peptide_sequence [tab] unique_identifier [tab] … peptide_intensity_subsample1 [tab] peptide_intensity_subsample2 [tab] … peptide_intensity_subsample3 P12345 [tab] ABC [tab] 15_1 [tab] 20 [tab] 4 [tab] 10 P12345 [tab] CDE [tab] 109_4 [tab] 10 [tab] 2 [tab] 12 P12345 [tab] ABC [tab] 920_2 [tab] 15 [tab] 7 [tab] 12 P12346 [tab] XYC [tab] 721_8 [tab] 15 [tab] 0 [tab] 92 P12346 [tab] ZYS [tab] 10_2 [tab] 10 [tab] 72 [tab] 10 Example for relative quantification using three samples (omitting the meta-data, only peptide and protein table shown) protein_accession [tab] protein_relative_subsample1_subsample3 [tab] … protein_relative_subsample2_subsample3 P12345 [tab] 5.2 [tab] 60 protein_accession [tab] peptide_sequence unique_identifier [tab] [tab] … peptide_relative_subsample1_subsample3 [tab] … peptide_relative_subsample2_subsample3 P12345 [tab] ABC [tab] 25_1 [tab] 0.3 [tab] 20 P12345 [tab] CDE [tab] 92_1 [tab] 10 [tab] 100