Textfile+format+for+Quantitative+Data

advertisement
Textfile Format for Quantitative Data
General Notes
To incorporate quantitative data in PRIDE XML files the new PRIDE Converter supports
quantitative data in the here presented text based format. This data is then incorporated in
PRIDE XML files using the new standard for reporting quantitative data.
File Format
The file format is a simple, tab-delimited file and consists of three sections, each separated by
an empty line: 1.) The metadata section, 2.) the protein table and 3.) the (optional) peptide table.
General Note: Since the page isn’t wide enough to display all the lines, lines that brake within
the actual line are “concatenated” by “…”.
Short Summary
quantification_method: iTRAQ
quantification_level: ENUM(protein, peptide)
intensity_measurement: ENUM(absolute, relative)
subsample[1-n]_description: Human dendritic cells untreated
subsample[1-n]_reagent: PRIDE [tab] PRIDE:0000116 [tab] iTRAQ reagent 116
subsample_cvparam: NEWT [tab] 9606 [tab] Homo sapiens (Human) [tab] …
subsample[1-n]
protein_accession [tab] protein_abundance_subsample[1-n] [tab] …
protein_intensity_subsample[1-n] [tab] protein_relative_subsample[1- …
n]_subsample[1-n] [tab]
P12345 28
0.3
7.5
protein_accession [tab] peptide_sequence [tab] unique_identifier [tab] …
peptide_intensity_subsample[1-n] [tab] peptide_relative_subsample[1- …
n]_subsample[1-n] [tab]
P12345 ABCD
12_54
30
0.4
10
P12345 CDE
73_1
25
0.1
5
Description
Header
As in the gel-based data format the header basically consists of key value pairs:
Field Name
Type
Mult Description
quantification_method
String
1..1
The quantification method used. Allowed
values are:

iTRAQ
quantification_level
String
1..1

SILAC

ICAT

… (list not yet complete)
Describes whether the reported method
quantifies the data at the protein or at
the peptide level. Allowed values are:
intensity_measurement
String
1..1

protein

peptide
Describes whether absolute or relative
intensities are being reported. Allowed
values are:
subsample[1-n]_description String
1..n

absolute

relative
A human readable description for every
subsample. The input from this field will
be reported as value in a cvParam.
subsample[1-n]_reagent
cvParam 1..n
A cvParam to describe the reagent used
to label this subsample.
subsample_cvparam
cvParam 0..n
Additional cvParams describing the
subsample. For example, the species as
NEWT parameter or the tissue as
BRENDA parameter. The subsample is
represented through the cvParams value
which must be in the format
subsample[1-n].
Protein Table
The protein table reports quantitative information at the protein level. This table should only
contain the columns available in the used quantitative method. For example, in iTRAQ
experiments where no direct protein intensities are being measured, the column type
protein_intensity[1-n] should not be used. The order of the columns in this table is flexible.
Field Name
Type
Description
protein_accession
String
The accession identifying the protein.
protein_abundance_subsample[ Double
The abundance should be used to report the
1-n]
calculated protein abundance when a peptide
centric quantitation method is being used. As
already discussed, current quantitation
methods generally generate ambiguous
results. Thus, human interpretation is required
to deduct quantitative information at the
protein level.
protein_intensity_subsample[1- Double
This column should be used to report
n]
quantitative information that was generated at
the protein level (f.e. in gel-based
approaches).
protein_relative_subsample[1- Double
This field should be used to report relative
n]_subsample[1-n]
protein abundance between two subsamples.
Peptide Table
The (optional) second table will be used to report quantitative data that was generated at the
peptide level.
Field Name
Type
Description
protein_accession
String
The peptide’s protein’s accession.
peptide_sequence
String
The peptide’s sequence.
unique_identifier
String
This is the exact same identifier used by the
DAO and required to distinctly link the entry to a
specific peptide (rather spectrum). Even though
the developer's (of other tools) don't have the
DAO unique_identifier at the time of the file
creation it will for most search engines (Mascot,
X!Tandem and SpectrumMill for sure) be
possible to retrieve this info based on our DAO
documentation. Mascot f.e. already exports the
quantitative information + the fields required for
the unique identifier in their csv format. A list of
unique identifiers per DAO can be found at the
end of this section.
peptide_intensity_subsample Double
The peptide's intensity for the respective
[1-n]
subsample.
peptide_relative_subsample[ Double
The peptide's relative intensity comparing one
1-n]_subsample[1-n]
subsample against the other. F.e.
peptide_relative_subsample1_subsample3: The
relative intensity comparing subsample1 against
subsample3.
Unique Identifiers
Search Engine Format
Mascot
Unique Identifier
[query id]_[rank] The unique identifier is build from the query’s (= the
spectrum’s) id and the identification rank.
X!Tandem
[protein id]
The X! Tandem DAO uses the domain’s id (1000.2.1)
as unique identifier.
Examples
An example how to report f.e. gel based quantitation where three biological samples were
multiplexed.
quantification_method: SEP [tab] SEP:00180 [tab] difference gel electrophoresis
quantification_level: protein
intensity_measurement: absolute
subsample1_description: untreated human dendritic cells
subsample2_description: Il-1 beta stimulated human dendritic cells
subsample3_description: TNF-alpha stimulated human dendritic cells
subsample1_reagent: cy3
subsample2_reagent: cy5
subsample3_reagent: cy7
subsample_cvparam: NEWT [tab] 9606 [tab] Homo sapiens (Human) [tab] subsample1
subsample_cvparam: NEWT [tab] 9606 [tab] Homo sapiens (Human) [tab] subsample2
subsample_cvparam: NEWT [tab] 9606 [tab] Homo sapiens (Human) [tab] subsample3
subsample_cvparam: CL [tab] CL:0000451 [tab] dendritic cell [tab] subsample1
subsample_cvparam: CL [tab] CL:0000451 [tab] dendritic cell [tab] subsample2
subsample_cvparam: CL [tab] CL:0000451 [tab] dendritic cell [tab] subsample3
protein_accession [tab] protein_intensity_subsample1 [tab] …
protein_intensity_subsample2 [tab] protein_intensity_subsample3
P12345 [tab] 18 [tab] 21 [tab] 0
P12346 [tab] 3 [tab] 29 [tab] 1002
P12347 [tab] 900 [tab] 29 [tab] 3
An iTRAQ example with three biological samples:
quantification_method: iTRAQ
quantification_level: peptide
intensity_measurement: absolute
subsample1_description: untreated human dendritic cells
subsample2_description: Il-1 beta stimulated human dendritic cells
subsample3_description: TNF-alpha stimulated human dendritic cells
...
protein_accession [tab] protein_abundance_subsample1 [tab] …
protein_abundance_subsample2 [tab] protein_abundance_subsample3
P12345 [tab] 15 [tab] 4 [tab] 11
P12346 [tab] 13 [tab] 36 [tab] 51
protein_accession [tab] peptide_sequence [tab] unique_identifier [tab] …
peptide_intensity_subsample1 [tab] peptide_intensity_subsample2 [tab] …
peptide_intensity_subsample3
P12345 [tab] ABC [tab] 15_1 [tab] 20 [tab] 4 [tab] 10
P12345 [tab] CDE [tab] 109_4 [tab] 10 [tab] 2 [tab] 12
P12345 [tab] ABC [tab] 920_2 [tab] 15 [tab] 7 [tab] 12
P12346 [tab] XYC [tab] 721_8 [tab] 15 [tab] 0 [tab] 92
P12346 [tab] ZYS [tab] 10_2 [tab] 10 [tab] 72 [tab] 10
Example for relative quantification using three samples (omitting the meta-data, only peptide
and protein table shown)
protein_accession [tab] protein_relative_subsample1_subsample3 [tab] …
protein_relative_subsample2_subsample3
P12345 [tab] 5.2 [tab] 60
protein_accession [tab] peptide_sequence unique_identifier [tab] [tab] …
peptide_relative_subsample1_subsample3 [tab] …
peptide_relative_subsample2_subsample3
P12345 [tab] ABC [tab] 25_1 [tab] 0.3 [tab] 20
P12345 [tab] CDE [tab] 92_1 [tab] 10 [tab] 100
Download