Answers to reviewers_MIAPE_Quant_v0.91

advertisement
Comments from anonymous external reviewer 1:
Section 1.1:
It might help to include a field for keywords indicating which disease, model,… was studied.
Added the text “Include keywords indicating which disease, model… was studied” in the description.
Section 1.2:
Addition agreed. Added “including e-mail address” in the text.
Section 2.1.1:
Given over-expression models are used (cell culture, mouse models) or spiked-in experiments: How can the
over-expressed protein (proteins) be separated from endogenous proteins?
We consider that the problem proposed by the reviewer is something inherent to that kind of experiments. Our
opinion is that to know how to solve that problem is something out of the scope of the MIAPE-Quant.
Section 2.2.1:
Was any pre-separation used, e.g. 1D gel separation in different gel pieces and how many?
Even though separation information is requested by other guidelines like MIAPE Gel Electrophoresis (GE),
MIAPE Column Chromatography (CC) or MIAPE Capillary Electrophoresis (CE), we agree with the reviewer
that it is important to know if some pre-fractionation (LC or gels) has been applied to the sample, since in these
cases the input data may have to be merged in a single data set prior to quantification, so a data transformation is
performed and it should be reported.
Therefore, we consider a good idea to add a new section about sample fractionation and how the fractioned data
has been handled. We have added a new sub-section in the “input data” section, called “3.3 Input data merging”,
that captures briefly the fractionation information of the sample(s) and whether the input data files coming from
each fraction have been merged prior to quantification.
Section 2.2.2:
Sample amount: how many ug were used.
Added a new section “2.2.2.2 Sample amount”
Section 3:
Were all samples combined in one batch or timely separated?
As we have described in a previous comment (section 2.2.1) a new section has been added “3.3 Input data
merging”, and it should include that information.
In case of time separation, how is ensured that MS runs do not show run to run differences per se?
We consider that the comment of the reviewer is an analytic problem that should be addressed at MS data
acquisition time to avoid contaminations between consecutive runs, and therefore it is out of scope of these
guidelines.
Section 4.1
What about self-made tools, e.g. for spectral counting?
Added “In case of in-house developed tools provide a brief description together with a reference if available”.
Section 4.3
“peptides with non-quantitative modification” -> it should be explained: M oxidation for example??
Text simplified from:
“Also state whether quantitation restricted to unique (i.e. non-shared) peptides, (ii) whether peptides with nonquantitative modifications are quantified, (iii) whether unmodified forms of peptides with non-quantitative
modifications are quantified”
To:
“Also state whether quantitation restricted to unique (i.e. non-shared) peptides or whether some
features/peptides have been excluded from the quantification (e.g. containing a non-quantitative modification
like oxidized methionine)”
Section 4.4
Word quantification missing
The text does not refer to the level in which the quantification has been performed. It refers to the level in which
data calculation or transformation methods have been applied. Therefore, in order to avoid misunderstandings,
we have added “… which level…the data calculation/transformation methods are performed…”.
Section 4.4.2
It might be interesting to include whether a ratio factor was used to exclude/include data and why a certain ratio
was used.
In this section the way in which the ratios are calculated should be described. The selection of a certain ratio
value as a threshold should be stated in section 4.5. since it is considered of an estimation of correctness.
Section 4.5
See my comment above concerning ratio as selection criterium.
According to the comments in section 4.4.2, we have added the text: “Also describe if any threshold value over
the ratio has been used in order to consider that a feature/peptide/protein is over/under-expressed, and why that
value has been selected.” in this section.
Comments from anonymous external reviewer 2:
1. There is no clear difference between points 2.1.1 (and 2.1.2) versus 2.2.2.3 as they all seem to describe the
groups and biological and technical replicates, thus incorporating redundant information to the report.
Point 2.1 describes the experimental design of the experiments, that is, how samples are organized in groups
and/or which replicates (biological and/or technical) have been performed. On the other hand, point 2.2 tries to
describe some information related to each one of the samples, that is, its unique name, the label (if any), and
also the group/replicate described in section 2.1 that each sample belongs to.
In practise, and in most cases, it can be reported as a single table like the one below (with more or less
information depending on the study).
Sample name
control_1
control_2
control_3
control_4
disease_1
disease_2
disease_3
disease_4
tr_disease_1
tr_disease_2
tr_disease_3
tr_disease_4
Sample
description
Healthy controls
Sepsis patient
Sepsis patients
treated with
drug test
Group
control
control
control
control
disease
disease
disease
disease
treated disease
treated disease
treated disease
treated disease
Type of
Replicate
Biological
Biological
Biological
Biological
Biological
Biological
Biological
Biological
Biological
Biological
Biological
Biological
Amount
1µg
1µg
1µg
1µg
1µg
1µg
1µg
1µg
1µg
1µg
1µg
1µg
However, for more complicated experiments (e.g. samples belonging to several groups, and groups containing
several samples), we prefer to keep sections 2.1 and 2.2.2.4 separated in order to have a more flexible
specification refering the groups and replicates defined in section 2.1 (2.1.1 and 2.1.2) from the section 2.2.2.3.
2. Comment on 4.1 and 4.2: Currently there are many quantification suites such as Proteome Discoverer and
OpenMS. Each of these softwares contain multiple modules and algorithms that can be combined to form
different pipelines. Therefore, it is not sufficient to specify the software (e.g. OpenMS v1.9) but also the
algorithms that have been used should be clearly stated at this point. This is partially addressed in 4.2 but a clear
statement in this regard should be made here.
Added “Brief description of the selected analysis pipeline if it is customizable from the software” in the text.
3. Comment on 4.3: There are many different approaches to calculate a confidence score and set a meaningful
threshold and there is a whole field in computational proteomics dealing with this issues. All have their
strengths and weaknesses and one could discuss which is the best approach, but in any case, it is necessary to
specify how the confidence score was determined and how the threshold was set (arbitrary? decoy strategy?
binomial distribution?). Moreover, some detail related to the level of filtering could also be added explicitly e.g.
Was the filter (fdr?) set at the protein level? Peptide level? etc.
Added the text “State if applicable in which level has been applied (PSM, peptide or protein level). This
includes the description of how the threshold has been determined (e.g. arbitrary, decoy strategy, binomial
distribution, etc…). “.
4. Comment on 5.1.1: In this point it is not clear whether raw data, partially processed data (e.g. normalized and
imputed) or final quantitative numbers should be attached here. It would be convenient to specify the initial
dataset (raw or partially processed: normalized and imputed) and then, the quantitative numbers referring either
to absolute quantitation or relative quantitation. Raw data would contain only values referring to areas, heights
or similar, without any confidence score yet. On the other hand processed datasets would include the quantities
(relative or absolute) with their confidence intervals, p-values, etc. according to the experimental design.
We assume that what we call “primary extracted quantification values from each feature” is actually what the
reviewer said about areas, heights or similar, and should be reported, for each feature in section 5.1.1. In some
cases, these values can have associated confidence values.
However, section 5.1.2 is requesting processed quantification values for peptide level, so for each quantified
peptides, a quantification value will be associated, and it will be the result of the aggregation of the feature
quantification values reported in 5.1.1 (how to do the aggregation is stated in 4.4.2).
Suggestions?
5. Finally some details on how one went from feature intensity to the peptide level and to the protein level
should be provided somewhere in point 4 or 5 (e.g. Was a TOP3 approach used? A nested ANOVA model?
Average of peptide fold changes?). For example, in SRM there are several ways to go from the transition level
to the peptide level: averaging all transitions, summing up all transitions, or even keeping all transitions as
individual features to be used as a nested parameter in an ANOVA analysis (see SRMstats from Olga Vitek et
col. vs. the strategy used by Steve Carr).
Inference of quantitation values from one level to other was basically requested in points 4.4.2 (“aggregation of
all features per peptide”), and point 4.4.5. (“Describe how protein quantification values are inferred from
peptide quantification values”).
We have added some examples described in your comment in the description of section 4.4.2.: “… e.g.,
aggregation of all features per peptide (average, geometric average, summing, ANOVA analysis)”.
In order to clarify the text, we have reformulated the name of section 4.4.5 (now is: “Protein quantification
values calculation and / or ratio determination from the peptide quantification values”) in concordance with
section 4.4.2. (“Quantification values calculation and / or ratio determination from the primary extracted
quantification values”), and added “and / or ratios have been inferred” in section 4.4.5 description.
Comments of public reviewer 1:
----------------------------4.4 to 5.1 quant at peptide level: should stay at peptide level unless there is additional evidence for (whole)
protein level eg western analysis.
The scope of this MIAPE Quant guidelines is focused in mass spectrometry-based quantification, which is
always based on a quantification of the peptides and secondly an inference of the protein quantification values
from the peptide ones. In case of spectral counting, the protein quantification value can be calculated directly
from the peptide spectrum matchings. We therefore think that it is always necessary to state all the process from
peptide quant to protein quant, and other evidences of protein quant values may be out of this guidelines scope.
Some experiments that are peptide centric have the conundrum of how to report the quantity of peptide and how
that translates to quant of protein. It should be mandatory to report the actual peptide sequence (as well as other
characteristics such as Mass, charge, rt) from which the quantitation was carried out.
All peptide characteristics that are used to quantify the peptides should be reported in 4.2 section. However, it is
true that there is not an explicit requirement for linking quantification values to peptide/protein information. It is
just requested to report the input data files where all these data should be there.
Therefore, for more clarification, we have added the text: “Report each quantification value together with the
appropriate information to identify the feature or peptide that is quantified (peptide sequence or peptide
sequence plus the charge state, mass, retention time, etc…).”and also “Report each quantification value
together with the appropriate information to identify the protein, such as the protein accession.”.
For large scale biomarker discovery projects from body fluids for eg- there is not a lot of concordance at the
protein level with the peptides quantitated - possibly due to degradation, protease activity etc. Some of those
peptides could be components of other proteins. Reporting the sequence, number of peptides quant is based on is
important.
The protein quantification values inference from peptide quantification values is explicitly requested in section
4.4.5. In fact, “(especially in case of shared peptides)” is added in the explanation in order to request how the
protein inference is performed specially in the case of peptides belonging to different proteins.
Also, what to do with 'missing' data points eg spectral counting is an obvious eg. Does '0' mean it is absent or
just below level of detection- can other criteria be used to report this type of data? What should be reported ... '0' for other quant data may be able to be obtained from baseline subtraction, transformation, peak areas below
the threshold. The approach used should also be reported (within section 4.4).
Missing value imputation is requested in section 4.4.1. However, we have added the text “Describe (if
appropriate) what does a ‘0’ value means, that is, whether it means absent or just below the level of detection,
etc…”
Final note:
We have changed some names of MIAPE sections in the introduction in concordance with section names in the
appendix.
Download