R2_mzTab_comments_SC_Reviewer_1_anon

advertisement
SC Reviewer 1:
Global comments:
Much has been corrected in the R2 submission. Two major concerns that are coming from many reviewers
and not addressed / rejected are:
-
Why forcing a high number of mandatory columns that will be empty in many cases? This list is
somewhat arbitrary and not fully justified
Why forcing an order of the mandatory columns, and at the same time have possible optional
columns appearing as intercalated? As the columns have CVed title names this is not necessary,
and in addition this will allow to intercalate optional columns, which will make parser break.
PSI Editor ME: From my point of view the balance between optional and mandatory is indeed arbitrary;
although this will probably influence the compliance of the standard within the community, there is no
editorial necessity to change something.
Detailed comments on answers of invited reviewer 1:
Page 24 "uri", "go_terms" and "protein_coverage":
They should be optional, as they are not easily found after a peptide search (and not even given by many
search engines).
Answer: While it is true that these fields may be unavailable at several times, we tried to keep the
number of optional columns as low as possible. This ensures that all mzTab files have a similar
structure and are therefore easier to use for inexperienced users.
SC Reviewer 1: If mzTAB is meant to be light and not heavy, this statement anc choice is
counterproductive. On one side mzTAB requires a minimal number of columns, which is for some
experiment not useful and makes the file bigger than necessary; on another side, there are some
optional columns; therefore the choice of keeping some obligatory is arbitrary. This must be
corrected for consistency reasons.
PSI Editor ME: same comment as above (From my point of view the balance between optional and
mandatory is indeed arbitrary; although this will probably influence the compliance of the standard within
the community, there is no editorial necessity to change something.)
Page 25:
Why must the peptide section follow the (optional) protein section? I don't see any reason for this, except a
possible accession-parsing. Though, the accessions in the peptide don't have to match any accessions in the
protein section, neither must there be any protein section at all.
Answer: As mentioned before, we believe that a fixed order of sections and thereby a more “stable”
format makes the format easier to use. At the same time, we cannot see any disadvantage in
enforcing a certain order. People developing software that is able to generate mzTab files will on
average be more experienced than people “consuming” the files. For them, it should not make a
difference whether a certain order of sections is enforced while the latter group might find it helpful
to know “where” to find what type of information.
SC Reviewer 1: I do not agree: having a non absolutely fixed format is prone to parsing error. If a
less experienced person as you mention is not designing the count of the columns in a smart way,
two perfectly valid mzTAB files with differing number of columns will generate odd results…
PSI Editor ME: same comment as above (From my point of view the balance between optional and
mandatory is indeed arbitrary; although this will probably influence the compliance of the standard within
the community, there is no editorial necessity to change something.)
Page 26:
The "unique" column may be very useful, but as mzTab should be an easy generated format, it should be
optional and not every search engine reports this value.
Answer: See previous responses. If this information is not available, it is possible to just use ‘null’.
SC Reviewer 1: Again, it’s an overkill. Particularly here, in addition, the definition of unique might
vary from one tool to another, which makes the interpretation of this field non homogeneous
PSI Editor ME: same comment as above; additionally: definition of unique is given in 5.7 and seems
conclusive to me (i.e. “database-unique”).
Page 27:
Also "retention_time" is neither given by every search engine, nor by every method in MS relevant (e.g.
direct injection), and thus should be optional.
Answer: We believe that the retention time of a peptide or a small molecule is a vital piece of
information that is used and usable in many workflows. Therefore, it should be possible to report it
in mzTab. To minimize the number of optional columns this column was defined as non-optional.
SC Reviewer 1: Retention times are useful only if comparing highly similar chromatographic
conditions. This will only be valid if one compare and exchange results that fulfil these criteria. Even
if I agree that this information can be very useful in some area, it is not the case for many
straightforward identification and quantitation jobs.
PSI Editor ME: same comment as above (From my point of view the balance between optional and
mandatory is indeed arbitrary; although this will probably influence the compliance of the standard within
the community, there is no editorial necessity to change something.)
Page 28:
In the description of the metadata it is said, that the {UNIT_ID}_ms-file is an optional value (as all metadata
is). But the mandatory field "spectra_ref" uses these fields information. If it is not given in the metadata,
there is no use of this mandatory reference format for the "spectra_ref". So either this field should be
optional or rather the ms-file in the metadata mandatory.
Answer: While the protein / peptide / small molecule sections are table based, the metadata section
is key-value based. To enable the easy concatenation of the table based sections we have decided to
minimize the number of optional columns. At the same time, the metadata section was defined as
an “all optional” section. We believe, that while this specific constellation is not ideal these more
general rules are easier to understand.
SC Reviewer 1: This comment should address the problem addressed by this reviewer. Either add a
constrain to the meta data or allow that mzTAB files are inconsistent.
PSI Editor ME: I think this is a difference between syntax and semantics; the syntax allows ms_file to be
missing, but then – by semantics – the file is automatically not valid, because spectra_ref is mandatory.
There is no disadvantage to include this implicite “mandatoriness” into the specification document; so
please add it.
Detailed comments on answers of SC reviewer 1:
in 6 "Every line in an mzTab file must start"
=> capitalize MUST
Answer: This was updated in the specification document.
in text (under Params) "Any field that is not available should be left empty"
=> should'nt that be MUST be left empty ??
Answer: This was changed to MUST.
=> how are space and comma characters constrained?
SC Reviewer 1: No answer?
6.4.1 sequence
=> how to encode sequence ambiguity (I/L), others, and results from sequence tags experiments?
Answer: See section new version 5.10.5. I/L can be represented as ‘J’.
SC Reviewer 1: And what about K/E, and sequence tags with {A1A2} where A1 and A2 are two residues, for
which we do not know the order in the sequence?
PSI Editor ME: Q/E = Z and N/D = B are already mentioned in 5.10.5. Regarding {A1A2} at the moment one
has to write XX, if I understand correctly. This is a loss of information, but from the editorial point of view
no necessity to change. @Authors: please decide whether you want to improve that now... Add that special
case / work-around into the specification document!
Other question: why not using PSI-MS CV for terms? no relationships to mzIdentML terms? just
independant terms?
Answer: We do not quite understand the reviewers comment. The mzTab format specification does
not exclude and CVs to be used but actually recommends to use the PSI-MS CV.
SC Reviewer 1: It appears that the column names do not follow the PSI-MS CV terms (some
examples below):
“Charge” is “charge state” in PSI-MS : MS:1000041
“Search_engine_score” is “search engine specific score for peptides” in PSI-MS: MS:1001143
“sequence” is “unmodified peptide sequence” in PSI-MS: MS:10000888
Would be appropriate to provide in the documentation a list of PSI-MS CVs and/or element names in
mzIdentML/mzQuantML to make sure that one can map the terms correctly.
PSI Editor ME: As I understand it, CV terms can be used but the above mentioned column names are “own
special labels”. Unfortunately there are also attributes in mzIdentML doubling CV terms (like
“SpectrumIdentificationItem/chargeState”). No editorial necessity to add such a list.
Download