R2_Answers_Reviewers_PSI_mzTab_final

advertisement
REVIEWERS’ COMMENTS
Invited Reviewer 1
mzTab: exchange format for proteomics and metabolomics results
Review of specification documentation in DocProc
General thoughts:
Though the PSI has put much effort into the development of vendor-independent formats for mass
spectrometry data, these formats are not easily parsable XML files. These data contain all relevant
information to grasp the workflow of an analysis, starting with the machine settings and including the whole
data analysis workflow. The main purpose of these formats is the compact and complete storage of the whole
workflow up to the point of the generation of the reported file. This makes a re-analysis of the data possible
and gives interested scientists as well as journals as much information about the proceedings and findings as
possible.
The proposed mzTab does not follow these guidelines. Instead it is proposed as a lightweight, easy to parse
and human readable format. But it is also not intended to replace the other formats (especially mzIdentML
and mzQuantML), but to give a recommendation for an interchange format, only containing the data used by
most tools for further analysis. This is a good idea as it, but it also contains the thread, that the “heavier”
formats won’t be used at all and instead only mzTab will be used.
Apart from this thread I think, a well-defined, easily read- and parsable format for data interchange is a good
idea and the specification is mainly well done, though some more work should be performed.
Specific comments:
Page 7: protein inference
Allowing only one accession per peptide is very counter intuitive. One very simple example: the mzTab file
is only used to give peptide information (what should be possible), how would it be possible to report the
connection to all possible proteins of the peptide? Referring to a protein via the "reported" accession for an
ambiguity group as in the given specification may be confusing. This group may have accessions in the
"ambiguity group", which are not valid for the given peptide, depending on the resource.
Answer: We have decided to change this in the new version of the specification. Following the
method used by various search engines (f.e. OMSSA) PSMs may be duplicated and thereby assigned
to different proteins. Even though it is still (and deliberately) not possible to completely represent the
complexity of the protein inference problem (as f.e. in mzIdentML) it is thereby possible no to lose
any data. The fact that these PSMs are duplicates can either be resolved through the (identical)
external spectral reference or through the identical search engine scores / retention times /
precursors.
Page14: Unit IDs
'IDs MUST NOT contain the prefix "_rep[1-n]" unless [...]' -> I think suffix is meant here.
Answer: This typo was corrected in the specification document
Page 14 (or anywhere):
Every section must obviously start with the header line (e.g. PRH), though this is nowhere stated in the
specification nor the 10-minute-guide. It is also not stated, that the each header row must only appear once in
the document.
Answer: All points were clarified in the specification document.
Page 14-20:
There should be a metadata-field allowing software-settings to at least capture the most important settings of
the used software. Though it is possible via the custom-field, a defined settings field would be much easier.
Answer: The field “software[1-n]-setting” was added to the format specification.
Page 20, also applies for peptides and small molecules:
Is there any specific reason, why the columns MUST be in the order of the document? There is also the
header-row, specifying which columns appear in the protein block and thus also the ordering will be given.
Answer: We currently do not see any disadvantage, either for software developers or for end-users
to enforce a fixed order of the format’s columns. The main advantage of this feature is that sections
from different files (or even answers from web services) can easily be concatenated (one of the use
cases mzTab aims to support). Additionally, we believe that human users might find this convention
helpful as all files will have the same structure. In addition, software viewers can be configured
according to the needs/preferences of each user.
Page 24 "uri", "go_terms" and "protein_coverage":
They should be optional, as they are not easily found after a peptide search (and not even given by many
search engines).
Answer: While it is true that these fields may be unavailable at several times, we tried to keep the
number of optional columns as low as possible. This ensures that all mzTab files have a similar
structure and are therefore easier to use for inexperienced users.
Page 25:
Why must the peptide section follow the (optional) protein section? I don't see any reason for this, except a
possible accession-parsing. Though, the accessions in the peptide don't have to match any accessions in the
protein section, neither must there be any protein section at all.
Answer: As mentioned before, we believe that a fixed order of sections and thereby a more “stable”
format makes the format easier to use. At the same time, we cannot see any disadvantage in
enforcing a certain order. People developing software that is able to generate mzTab files will on
average be more experienced than people “consuming” the files. For them, it should not make a
difference whether a certain order of sections is enforced while the latter group might find it helpful
to know “where” to find what type of information.
Page 26:
The "unique" column may be very useful, but as mzTab should be an easy generated format, it should be
optional and not every search engine reports this value.
Answer: See previous responses. If this information is not available, it is possible to just use ‘null’.
The type is specified as “Boolean”, but in the example the values "0" and "1" are used. Though it is obvious,
what is meant, another way would be using "true" and "false" (or these written in capitals). It would be easier
for parses to further specify, whether to use 0/1, true/false or both.
Answer: This has been clarified in the specification document. Only “0”/”1” are supported.
Page 27:
Also "retention_time" is neither given by every search engine, nor by every method in MS relevant (e.g.
direct injection), and thus should be optional.
Answer: We believe that the retention time of a peptide or a small molecule is a vital piece of
information that is used and usable in many workflows. Therefore, it should be possible to report it
in mzTab. To minimize the number of optional columns this column was defined as non-optional.
Page 28:
As the charge (with today’s machines) can only be an Integer, the type should be Integer rather than Double.
Answer: We have decided to change it to an Integer in the new version of the format specification.
The uri should be optional with the same reason as above.
Answer: See answers above (the same applies).
In the description of the metadata it is said, that the {UNIT_ID}_ms-file is an optional value (as all metadata
is). But the mandatory field "spectra_ref" uses these fields information. If it is not given in the metadata,
there is no use of this mandatory reference format for the "spectra_ref". So either this field should be
optional or rather the ms-file in the metadata mandatory.
Answer: While the protein / peptide / small molecule sections are table based, the metadata section
is key-value based. To enable the easy concatenation of the table based sections we have decided to
minimize the number of optional columns. At the same time, the metadata section was defined as an
“all optional” section. We believe, that while this specific constellation is not ideal these more
general rules are easier to understand.
Page 29:
Why must the small molecule section follow the (optional) protein and peptide section?
Answer: see Answer above. We prefer this decision to keep the format consistent.
Page 31:
charge: type should be Integer
retention_time: should be optional with same argument as in peptide section
Answer: see answers above.
Page 32:
uri: Though I am not used to small molecule identification, I think, it should be optional (same as in proteins
and peptides).
spectra_ref: see comment for peptides
Answer: See out answers about the same topic before.
Invited Reviewer 2
Generally I find the document well written and very useful.
Here my comments:
The ten minutes guide:
----------------------
In the ten minutes guide in the "Units in MzTab" chapter you specify in the developers section which
symbols to use for units. No such developer specification is given for other fields such as column names.
It would be clearer to leave the details in the main document and only refer to them.
Answer: While developing mzTab in cooperation with several research groups at different locations
we found that the concept of “Units” caused the most problems. We therefore decided to explicitly
explain this feature in the “10 minute guide”.
In the peptides section you use "ABC" as a peptide name (better use a real peptide). This raises the question
which amino acid letters are allowed in the MzTab documents, and if one uses the letters `B`,`J`,`O`, ...
shouldn't one specify what they mean.
Answer: Section 5.10.5 (Reporting sequence ambiguity), including the cases of B, J, O,… has been
added to the new version of the specification document.
mzTab format specification:
---------------------------
Do protein and peptide data need to be consistent? E.g. if the PRT field num_peptides = 3, is it required that
3 peptides of this protein are listed in the file? Or do modifications of the protein also appear on the peptide
level? This may have some relevance for programmers writing parsers.
Answer: If a protein and a peptide section are present in an mzTab file it is expected that these two
sections are consistent – as in any other proteomics file format. This has been clarified in the
specification document (Section 6.1).
I would indicate for each column whether it is mandatory or not. This would help readers who just have a
quick look at the text.
Answer: In the table based sections all columns are mandatory, apart from the quantification
associated ones and – of course – the optional columns. This is highlighted in the format
specification by the word “(Optional)” after each optional column name. In the beginning of the
metadata section it explicitly states that “All fields in the metadata section are optional.”
In some applications spectrum matches to the same peptide are combined. For example, in quantitative
proteomics it could make sense to combine peptides with different charge states or chemical modifications.
Or in spectrum clustering a peptide could match to a consensus spectrum. This requires that one PEP entry
would have several spectra, which does not seem to be possible in MzTab. Could you comment on this issue
and add some clarification to the text.
Answer: Peptides can be linked to multiple spectra. This is explicitly defined in the mzTab
specification: “Multiple spectra MUST be referenced using a “|” delimited list.” Thereby, a peptide
identification can be linked to multiple spectra, even from different source files.
The ratio 0/0 should be specified as NaN and not INF. I don't think that NaN should be replaced by NA,
since these are 2 different things.
Answer: NaN (and also INF) have been added to the mzTab specification for these use cases.
SC_reviewer 1
SC reviewer 1 comments:
-----------------------
4. Relationship to other specifications
=> no relationship to MIAPE? Explicitly? implicitly? not necessary?
Answer: mzTab does not aim to fulfil the MIAPE guidelines. In fact, it allows different degrees of
experimental metadata annotation. We believe that it is not necessary to give further explanation
about this. Other formats like mzIdentML and mzQuantML have been developed with the MIAPE
guidelines in mind.
=> no relationships with TrAML?
Answer: TraML is a file format for SRM transition specific data while mzTab is focused on
identification/ basic quantification related MS based data. In this version of the mzTab specification
no support for SRM quantification is provided. In the future, it would be definitely possible to do it
(and this is where TraML would have a clear role).
4.1
=> Please add NEWT as possible CV as it is used in examples in 6.2.23 to examplify the unit {UNIT_ID}({SUB_ID})-species[1-n] ;
same for BTO, CL, DOID etc used under 6.2.nn
Answer: The mzTab format specification does not define allowed ontologies. Any suitable ontology
may be used. The above mentioned CVs/ontologies are now included in Section 4.1 in the
specification document.
6.2.13 {UNIT_ID}-uri
=> if multiplicity is 0 ... *, please specify 6.2.13 {UNIT_ID}-uri[1-n]
otherwise | as separator for multiple uris is not appropriate and can be misleading
Answer: This field can occur multiple times (thus there can be multiple “uri” lines for a single unit)
to specify multiple URIs for a single UNIT. It would not be possible to define any alpha-numerical as
separator for URLs as they may be part of a URL. We have extended the example provided to
exemplify this convention.
5.1. Handling updates to the controlled vocabulary
the form http://www.psidev.info/index.php?q=node/440 points to a url that does not exist.
=> Please provide a more stable one
Answer: This URL no longer exists in the new PSI website, so it has been removed from the
specification document.
in 5.4
=>In text " ...for an experiment ìEXP_1î, the replicates must have the UNIT_IDs ìEXP_1-rep[1-n]î " must
have MUST be capitalized here.
Answer: This has been updated in the specification document.
=> "Biological replicates are not explicitly supported in the same way in mzTab." therefore anything is
possible? what are the constraints? no constraints = difficult to limit imagination of people...
Answer: Experimental setups are extremely diverse and constantly evolving. Therefore, any set
constraint would result in unsupported use cases. We therefore decided to deliberately provide
researchers with an, in this respect, open format to be able to report the data from their
experimental designs.
in 5.5
example :
The following example shows how two different quantitative experiments can be reported in one mzTab file.
Not all labels are shown
MTD EXP_1-quantification_method [MS,MS:1001837,iTraq,]
=> in MS CV, MS:1001837 is defined as "iTRAQ quantitation analysis" ; please correct the example
accordingly.
Answer: This was corrected.
later in same example:
MTD EXP_2-quantification_method [MS,MS:100999,SILAC,] ;
=> replace text in brackets by [MS,MS:1001835,SILAC quantitation analysis,]
Answer: This was corrected.
in second example of 5.5:
Example showing how emPAI values are reported in an additional column using
MS CV parameter emPAI value (MS:1001905)
PRH accession opt_cv_MS:1001905
PRT P12345 0.658
=> the column title is opt_cv_MS:1001905, therefore not readable by a non bioinformatician. Where can
one have a full text name to help targeted users reading the information? As stated in 5.10.3, it is allowed to
use anything, as soon as it is different from another column. So why specifying this possibility that is not a
human readable one?
Answer: In the new version, for optional columns it is needed to specify both the CV param
accession and the parameter name following this structure:
opt_cv_{accession}_{parameter name}.
5.8
=> in text " ... ì-ì must be provided as the value for each of " : please capitalize MUST
=> in the text " The reliability MUST be an integer" : change to "When a reliability value is provided, this
value MUST be an integer"
Answer: This was updated in the specification document.
in 5.9
in text : "All (identified) variable modifications as well as fixed modifications MUST be reported for every
identification."
=> this sounds nice but if mzTab is meant to cover a simplified but straight to the point representation of
peptide or protein or small molecule identifications, this point is overkilling. Take the simple example of
phosphopeptides: it is not important to show the position of an oxidized methionine, or of a iTraQ label when
an author wants to report a list of phosphorylation positions. I'm sure that this will not be followed. Therefore
change MUST by something less stringent
Answer: This is a well-known issue when dealing with modifications. However, we believe that this
should be enforced with a MUST. Otherwise, data will not be consistent and then, difficult to trust.
later in text: "Furthermore, mass deltas MUST NOT be reported if the given delta can be expressed through a
known and unambiguous chemical formula."
=> as this is commonly used by tools such as Sequest and ProteinProspector, I'm not convinced at all that
you can claim this to be followed, particularly because these mods are defined by a mass delta value without
requiring a "name" for it.
Answer: We now changed the requirement level to SHOULD NOT.
about 5.10.1 the approach means that one peptide (or one protein) is represented twice (two lines) if it is
found both by Andromeda and Mascot. This is not a simple final list. Authors prefer to have one line with
both scores in one line (this comment is based on mztab_merged_example.txt
=> An appropriate example is probably missing, as one can encode [MS,MS:1001171,Mascot
score,50]|[MS,MS:1001155,Sequest:xcorr,2] according to 6.3.9
Answer: The merged example demonstrates how multiple mzTab files can be merged by simple
concatenation. It is therefore natural and intended that one peptide or protein might be represented
by two lines but with different unit ids. In contrast, files that contain search results from multiple
search engines only report every protein / peptide / small molecule once (see specification document
6.3.8-9 for the reporting format). Note that this can be achieved only by additional processing, e.g.
consider a researcher wants to provide a single mzTab file based on two different mzTab files
originating from different search engines. He can simply do so by reporting each peptide or protein
once using the format to report multiple scores and search engines but he has to take care to provide
a new unit id and adapt the meta values accordingly.
about 5.10.4: about the text "This field MUST NOT be used to reference an external MS data file. MS data
files should be referenced using the method described in Section 5.2".
=> what about referencing mzML? or a specific spectrum in mzML? this is not specified in 5.2
Answer: Referencing spectra in mzML files is done using the method described in “MS:1000777”.
This method is used for any file format that has a native id format for the spectra within the files (ie.
mzML, mzData, etc.).
in 6 "Every line in an mzTab file must start"
=> capitalize MUST
Answer: This was updated in the specification document.
in text (under Params) "Any field that is not available should be left empty"
=> should'nt that be MUST be left empty ??
Answer: This was changed to MUST.
=> how are space and comma characters constrained?
About the numbers: is a scientific number forbidden (such as 1.4E10) ?
Answer: No, they are allowed.
6.2
in section Unit ID: the term unit is given in small caps: under 5.4 and 6.1 it is always written UNIT.
=> please be coherent
Answer: This was fixed.
about
{UNIT_ID}-{SUB_ID}-custom
and
{UNIT_ID}-custom
=> what is the difference as "-" characters are allowed in the naming of these terms
Answer: {UNIT_ID}-{SUB_ID}-custom is only applicable to the sub-samples, but the general
concept is the same. There are several experimental setups where a researcher may want to report
additional information about a specially treated / processed sample which cannot be expressed with
the fields provided. As this information may not be applicable to all subsamples (i.e. in a 4-plex
setup) optional subsample fields are allowed.
in 6.3
The protein section must always come
=> capitalize MUST
Answer: Done.
There MUST NOT be any empty cells.
=> how do we need to fill these? NA?
Answer: All the empty cells need to be filled using ‘null” if no information is available. In the new
version of the specification, ‘NA’ has been substituted by ‘null’ (INF and NaN are now also
possible).
The columns in the protein section MUST be in the order they are presented in this document
=> it is not said whether the columns described under 6.3. are all mandatory or not. It is overkilling to add all
of these columns if not necessary from the user point of view to explain and transfer its result (for instance
go-terms, number of peptides, taxid, etc.)
Answer: As mentioned before for Reviewer 1, we prefer to keep a number of mandatory columns. If
no information is provided, it is possible to add just ‘null’.
6.3.4 taxid and 6.3.5 species
=> what if a custom or synthesized protein? NA?
Answer: There is one CV term from NEWT (taxID) called ‘synthetic’ (accession number 32630).
6.3.7 database_version
=> today, UniprotKB version names are using underscore and not hyphen in their names. Please change
2011-11 by 2011_11
Answer: Done.
6.3.11, 12, 13
=> what about number of PSM? is this considered identical to 6.3.11?
Answer: This is left for each data producer to export following its specific criteria. A software may
consider that two peptides with the same sequence correspond to 2 peptides (2 PSMs) or just one.
We think the first approach is probably more consistent.
6.3.15 modifications
=> sounds like reinventing the wheel and is not compatible with current outputs from some tools (forcing a
double between 0 and 1 when a given software provides an open scale value is not a good idea. Why not
looking at PEFF and at GPMDB for this?
Answer: This was changed in the specification document (please see Section 5.9,
Reporting
modifications and amino acid substitutions). The use of params for modification reliability is now
mandatory so a Double value is no longer supported.
6.3.16 uri
=> not sur what this is: uri representing the entry in the searched database or something else?
Answer: The URI can represent for instance the location of the protein identification in a proteomics
repository, or the name of the original mzIdentML file where that protein was detected. Please also
see section 5.10.4 of the spec document.
6.3.17 go_terms
=> why using a string list and not a string with | separators
Answer: This was changed in the format specification.
6.3.19 protein_abundance_sub[1-n] (Optional)
=> what is expected here? concentrations, number of molecules? there are no units...=
Answer: In the new version of the specification, a mechanism was introduced to specify units (see
sections 6.2.32, 6.2.33 and 6.2.34).
6.4.
The peptide section must always
=>capitalize MUST
Answer: Fixed.
=> are they all required? this might potentially generate huge and non necessary redundancy (for instance if
one database and one database version was used, which is the case for all non merge results)
Answer: As mentioned earlier, we prefer to keep the files consistent at the price of redundancy. As
explained before, this can help to concatenate different files.
6.4.1 sequence
=> how to encode sequence ambiguity (I/L), others, and results from sequence tags experiments?
Answer: See section new version 5.10.5. I/L can be represented as ‘J’.
6.4.5 database
=> how to describe UniProtKB/Swiss-Prot human complete proteome subset + crap database ?
Answer: The database is reported on a per entry basis in the protein / peptide / small molecule
section. Thereby, even if a combined database is used, a single entry can only originate from one of
the underlying original databases and the problem does not occur.
6.4.8 search_engine_score
=> I want to report score ane evalues for one peptide identified by Mascot and xcorr and peptide prophet
probabilities . How do ai show this?
Answer: This can be done for the same peptide, adding both scores as CV params separated by “|”.
6.4.11 retention_time
=> why constraining to seconds? It might be simpler and actually more often used as minutes of relative
retention time or even retention index depending on the application
Answer: See answer above about how units can now be represented in mzTab (see sections 6.2.32,
6.2.33 and 6.2.34).
6.4.12 charge
=> why a Double?
Answer: See previous answers to the same topic. We decided to change it to an Integer.
6.4.14 uri
=> I thought it can be the pointer to a mzIdentML file position... SO how to do this?
Answer: It is a pointer to the mzIdentML files where this identification is reported. However, we
think it is not needed to go further and specify the location in the file.
6.5
The small molecule section must always
=> capitalize MUST
Answer: Fixed.
6.5.3 chemical_formula
Elements should be capitalized properly to avoid confusion
=> I would change should to MUST ...
Answer: Fixed.
The chemical formula reported should refer to the neutral form.
Charge state is reported by the charge field. This permits the comparison of positive and negative mode
results
=> No only chemical formula of neutral and charge state is not sufficient. what about adducts? and how to
exemplify for glycine: C2H5NO2: the protonated form is C2H3NO2 with charge 1 and the deprotd ionated
(negative form) is C2H4NO2 with charge -1. But sodiates C2H4NO2Na charge 1. And if I take the example
of a doubly charged species, it gets even worse... How do you want to compare the chemical formulae?
Please be more precise with what is expected or remove the comment.
Answer: We thank the reviewer for pointing this out. We now note that mzTab offers support for
adducts via the modifications column in the specification document.
For example:
- a sodiated glycine is reported with formula: C2H5NO2, modifications: CHEMMOD:+Na-H and
charge: 1,
- the deprotonated ionated negative form is reported with formula: C2H5NO2, modifications:
CHEMMOD:-H and charge: -1, and
- the protonated form with formula: C2H5NO2, modifications: CHEMMOD:+H and charge: 1.
6.5.6 description
if it is allowed to provide a list of identifiers, then it should also be needed to have multiple descriptions
(same is true for inchi...)
=> is it allowed? if not, please add a constrain under 6.5.1; if yes, please make sure there are no ambiguity
left...
Answer: More than one description, InChi and SMILES are now supported.
6.5.9 retention time
=> same as 6.4.11
Answer: Same answer applies.
6.5.16 spectra_ref
The reference must be in the format
ms_file[1-n]:{SPEC_REF} where SPEC_REF must follow the format defined in mzIdentML
=> but according to the specs, it is possible to refer to something else than a mzIdentML file, which can have
another way to index or point to a spectrum. Please allow more options
Answer: We are not sure about what the reviewers means. In mzTab it is possible to reference
external spectra in the same way it is done in the mzIdentML format specification. We think that all
the options are possible there and for allowing for in the future, just new CV params would need to
be created (see section 5.2).
7. Conclusions
"These artefacts are currently undergoing the PSI document process standardization process"
=> remove "standardization process"
Answer: Done.
Other question: why not using PSI-MS CV for terms? no relationships to mzIdentML terms? just
independant terms?
Answer: We do not quite understand the reviewers comment. The mzTab format specification does
not exclude and CVs to be used but actually recommends to use the PSI-MS CV.
Public commenter 1
I was pleased to see this format proposed as I think it's important to allow people to exchange data in a tabdelimited format. I think the specification looks good and the supporting documents are good. I'm concerned
that you have too many required columns, but I guess the easy-out is to just put NA when you don't know the
value or it's not applicable.
I would not require that the columns must be in the specific order; downstream software should be able to
parse the header line of each section to determine the columns that are present and the order that they're in.
By requiring this order of columns you lock yourself into specific columns in a specific order long-term.
Also, this would allow users to simply leave out a column if it's not applicable. That way you don't get an
entire column of NA values.
Again, downstream software can read the header line, see what's there, and for the columns that it knows
about that aren't there, it can just internally record NA.
Answer: As mentioned before, we currently do not see any disadvantage, either for software
developers or for end-users to enforce a fixed order of the format’s columns. The main advantage of
this feature is that sections from different files (or even answers from web services) can easily be
concatenated (one of the use cases mzTab aims to support). Additionally, we believe that human
users might find this convention helpful as all the files will have the same structure.
I don't like having to record NA for an empty cell, though I can understand having the requirement. Still, I
don't like it; software can easily parse two tabs in a row as meaning there is an empty cell.
Answer: We agree in that is not completely ideal but we still think it can solve a lot of potential
problems. In the new version of the specification, ‘NA’ has been substituted by ‘null’ (also NaN and
INF are now possible).
Did you consider listing database, database_version, and search_engine in the Metadata section? By
including those in the protein section and in the peptide section you're replicating the same data on every
line, thus leading to file-bloat. The only instance I could see where you would need these in the peptide
section is if the mzTab document includes search results from multiple search engines.
Answer: To have this granularity allows combined results from different search engines in a more
efficient way. Also the concatenation of files is made more consistent if these three essential pieces of
information are annotated per protein/peptide/small molecule.
One additional thought: did you consider having an optional section for mass spectral data (scan, m/z, and
intensity)? If I wanted to exchange MS data (either MS1 spectra or MS2 spectra) in a text format, what
would
be
the
suggested
format?
The
MS2
format
comes
to
mind
(http://noble.gs.washington.edu/proj/crux/ms2-format.html), but I believe that is specific to MS/MS data. I
realize we don't need yet-another file format for MS data, but you have defined a clear text-based format here
for the proteins and peptides, so I thought perhaps people might also want to include some important mass
spectra using this format (likely not full mass spectra, just some key peaks).
Answer: We think there are already quite a few formats for reporting mass spectra in a text format.
Some of them are incomplete (pkl, dta) but others allow the reporting (optionally) of rich information
such as mgf or MS2. It is outside the scope of mzTab.
portion 2:
--------I have another suggestion: I think it would help readability and would provide some error checking to
include the residue in the modifications. For example, instead of:
accession
modifications
gi|10181184
13[0.8]-UNIMOD:35,29[0.2]|35[0.4]-UNIMOD:21
gi|1050551
50-MOD:01499,K59-MOD:01499
IPI00000980
53[68.0]-MOD:00016
IPI00002824
NA
Use:
accession
modifications
gi|10181184
N13[0.8]-UNIMOD:35,Q29[0.2]|V35[0.4]-UNIMOD:21
gi|1050551
E70-MOD:01499,K79-MOD:01499
IPI00000980
E53[68.0]-MOD:00016
IPI00002824
NA
Answer: We have preferred to leave just the position of the aminoacid (not including the amino
acid). We do not think that this redundancy is needed.
Also, is there a reason why the iTRAQ mods (MOD:01499) weren't being shown at the protein level in
mztab_merged_example.txt?
Answer: Thanks to the reviewer for pointing this out. The example file was fixed.
portion 3:
---------
The iTRAQ mods are included at the protein level in PRIDE_Exp_Complete_Ac_16649.xml-mztab.txt so
that makes me feel a little better. I was worried that wasn't allowed, but now I see that I was wrong.
Public commenter 2
I have read and considered the mzTab document and most examples. It addresses well the many issues of
reporting proteomics data and gives itself room for flexibility. It was readable in spite of the underlying
complexity that this field of work imposes.
I noticed the following point in the documentation and examples. Page 6: The data type and terms included:
"WIFF nativeID format" but did not specify source, i.e. ABSciex/ABI (maybe not important)
Answer: We think this is not really very important. This information should be included in the PSIMS CV.
The example files did not detail as many MTD entries as are described in the documentation. If similar
documents are likely to be previewed by potential users in the future, a better representation of the terms
would be useful.
Answer: We have tried to improve the examples, with more metadata information. However, it is
important to highlight that all metadata information is optional.
Maybe I misread the description of the terminology, but the iTRAQ example contains many uncharacteristic
values of protein abundance values. i.e. there is a one values of unity and several that are 60 thousand +/60k. This again is likely unimportant in the framework of the document.
Answer: There are several ways how to report the results of an iTRAQ study. Our example covers
reporting of ratios (as specified in the metadata section). These can be normalized to the 114
channel (as done in our example) yielding unity for the first channel and relative ratios for the other
ones. Also several orders of magnitudes of variation are expected depending on the biological
sample and experimental setup.
Suggested text changes. Page 3:
I see "support. Section Error! Reference source not found."
Answer: Now corrected.
Page 4:
"The following use cases have driven the development of the mzTab data model,"
The following cases of usage have driven the development of the mzTab data model,......
Answer: Now corrected.
Page 4:
"The specification described in this document is not being developed in isolation; indeed, it is designed to be
complementary to, and thus used in conjunction with, several existing and emerging models. Related
specifications include the following:"
The specification described in this document has not been developed in isolation....
Page 5: "The CV has been generated by collection of terms from software vendors and academic groups
working in the area of mass spectrometry and proteome informatics."
The CV has been generated with a collection of terms from.......
Answer: All these text changes have been done.
Many thanks to the hardworking members of the PSI community.
Download