Answers to reviewers_MIAPE_Quant_v0.91

Comments from anonymous external reviewer 1: Section 1.1: It might help to include a field for keywords indicating which disease, model,… was studied. Added the text “Include keywords indicating which disease, model… was studied” in the description. Section 1.2: Addition agreed. Added “including e-mail address” in the text. Section 2.1.1: Given over-expression models are used (cell culture, mouse models) or spiked-in experiments: How can the over-expressed protein (proteins) be separated from endogenous proteins? We consider that the problem proposed by the reviewer is something inherent to that kind of experiments. Our opinion is that to know how to solve that problem is something out of the scope of the MIAPE-Quant. Section 2.2.1: Was any pre-separation used, e.g. 1D gel separation in different gel pieces and how many? Even though separation information is requested by other guidelines like MIAPE Gel Electrophoresis (GE), MIAPE Column Chromatography (CC) or MIAPE Capillary Electrophoresis (CE), we agree with the reviewer that it is important to know if some pre-fractionation (LC or gels) has been applied to the sample, since in these cases the input data may have to be merged in a single data set prior to quantification, so a data transformation is performed and it should be reported. Therefore, we consider a good idea to add a new section about sample fractionation and how the fractioned data has been handled. We have added a new sub-section in the “input data” section, called “3.3 Input data merging”, that captures briefly the fractionation information of the sample(s) and whether the input data files coming from each fraction have been merged prior to quantification. Section 2.2.2: Sample amount: how many ug were used. Added a new section “2.2.2.2 Sample amount” Section 3: Were all samples combined in one batch or timely separated? As we have described in a previous comment (section 2.2.1) a new section has been added “3.3 Input data merging”, and it should include that information. In case of time separation, how is ensured that MS runs do not show run to run differences per se? We consider that the comment of the reviewer is an analytic problem that should be addressed at MS data acquisition time to avoid contaminations between consecutive runs, and therefore it is out of scope of these guidelines. Section 4.1 What about self-made tools, e.g. for spectral counting? Added “In case of in-house developed tools provide a brief description together with a reference if available”. Section 4.3 “peptides with non-quantitative modification” -> it should be explained: M oxidation for example?? Text simplified from: “Also state whether quantitation restricted to unique (i.e. non-shared) peptides, (ii) whether peptides with nonquantitative modifications are quantified, (iii) whether unmodified forms of peptides with non-quantitative modifications are quantified” To: “Also state whether quantitation restricted to unique (i.e. non-shared) peptides or whether some features/peptides have been excluded from the quantification (e.g. containing a non-quantitative modification like oxidized methionine)” Section 4.4 Word quantification missing The text does not refer to the level in which the quantification has been performed. It refers to the level in which data calculation or transformation methods have been applied. Therefore, in order to avoid misunderstandings, we have added “… which level…the data calculation/transformation methods are performed…”. Section 4.4.2 It might be interesting to include whether a ratio factor was used to exclude/include data and why a certain ratio was used. In this section the way in which the ratios are calculated should be described. The selection of a certain ratio value as a threshold should be stated in section 4.5. since it is considered of an estimation of correctness. Section 4.5 See my comment above concerning ratio as selection criterium. According to the comments in section 4.4.2, we have added the text: “Also describe if any threshold value over the ratio has been used in order to consider that a feature/peptide/protein is over/under-expressed, and why that value has been selected.” in this section. Comments from anonymous external reviewer 2: 1. There is no clear difference between points 2.1.1 (and 2.1.2) versus 2.2.2.3 as they all seem to describe the groups and biological and technical replicates, thus incorporating redundant information to the report. Point 2.1 describes the experimental design of the experiments, that is, how samples are organized in groups and/or which replicates (biological and/or technical) have been performed. On the other hand, point 2.2 tries to describe some information related to each one of the samples, that is, its unique name, the label (if any), and also the group/replicate described in section 2.1 that each sample belongs to. In practise, and in most cases, it can be reported as a single table like the one below (with more or less information depending on the study). Sample name control_1 control_2 control_3 control_4 disease_1 disease_2 disease_3 disease_4 tr_disease_1 tr_disease_2 tr_disease_3 tr_disease_4 Sample description Healthy controls Sepsis patient Sepsis patients treated with drug test Group control control control control disease disease disease disease treated disease treated disease treated disease treated disease Type of Replicate Biological Biological Biological Biological Biological Biological Biological Biological Biological Biological Biological Biological Amount 1µg 1µg 1µg 1µg 1µg 1µg 1µg 1µg 1µg 1µg 1µg 1µg However, for more complicated experiments (e.g. samples belonging to several groups, and groups containing several samples), we prefer to keep sections 2.1 and 2.2.2.4 separated in order to have a more flexible specification refering the groups and replicates defined in section 2.1 (2.1.1 and 2.1.2) from the section 2.2.2.3. 2. Comment on 4.1 and 4.2: Currently there are many quantification suites such as Proteome Discoverer and OpenMS. Each of these softwares contain multiple modules and algorithms that can be combined to form different pipelines. Therefore, it is not sufficient to specify the software (e.g. OpenMS v1.9) but also the algorithms that have been used should be clearly stated at this point. This is partially addressed in 4.2 but a clear statement in this regard should be made here. Added “Brief description of the selected analysis pipeline if it is customizable from the software” in the text. 3. Comment on 4.3: There are many different approaches to calculate a confidence score and set a meaningful threshold and there is a whole field in computational proteomics dealing with this issues. All have their strengths and weaknesses and one could discuss which is the best approach, but in any case, it is necessary to specify how the confidence score was determined and how the threshold was set (arbitrary? decoy strategy? binomial distribution?). Moreover, some detail related to the level of filtering could also be added explicitly e.g. Was the filter (fdr?) set at the protein level? Peptide level? etc. Added the text “State if applicable in which level has been applied (PSM, peptide or protein level). This includes the description of how the threshold has been determined (e.g. arbitrary, decoy strategy, binomial distribution, etc…). “. 4. Comment on 5.1.1: In this point it is not clear whether raw data, partially processed data (e.g. normalized and imputed) or final quantitative numbers should be attached here. It would be convenient to specify the initial dataset (raw or partially processed: normalized and imputed) and then, the quantitative numbers referring either to absolute quantitation or relative quantitation. Raw data would contain only values referring to areas, heights or similar, without any confidence score yet. On the other hand processed datasets would include the quantities (relative or absolute) with their confidence intervals, p-values, etc. according to the experimental design. We assume that what we call “primary extracted quantification values from each feature” is actually what the reviewer said about areas, heights or similar, and should be reported, for each feature in section 5.1.1. In some cases, these values can have associated confidence values. However, section 5.1.2 is requesting processed quantification values for peptide level, so for each quantified peptides, a quantification value will be associated, and it will be the result of the aggregation of the feature quantification values reported in 5.1.1 (how to do the aggregation is stated in 4.4.2). Suggestions? 5. Finally some details on how one went from feature intensity to the peptide level and to the protein level should be provided somewhere in point 4 or 5 (e.g. Was a TOP3 approach used? A nested ANOVA model? Average of peptide fold changes?). For example, in SRM there are several ways to go from the transition level to the peptide level: averaging all transitions, summing up all transitions, or even keeping all transitions as individual features to be used as a nested parameter in an ANOVA analysis (see SRMstats from Olga Vitek et col. vs. the strategy used by Steve Carr). Inference of quantitation values from one level to other was basically requested in points 4.4.2 (“aggregation of all features per peptide”), and point 4.4.5. (“Describe how protein quantification values are inferred from peptide quantification values”). We have added some examples described in your comment in the description of section 4.4.2.: “… e.g., aggregation of all features per peptide (average, geometric average, summing, ANOVA analysis)”. In order to clarify the text, we have reformulated the name of section 4.4.5 (now is: “Protein quantification values calculation and / or ratio determination from the peptide quantification values”) in concordance with section 4.4.2. (“Quantification values calculation and / or ratio determination from the primary extracted quantification values”), and added “and / or ratios have been inferred” in section 4.4.5 description. Comments of public reviewer 1: ----------------------------4.4 to 5.1 quant at peptide level: should stay at peptide level unless there is additional evidence for (whole) protein level eg western analysis. The scope of this MIAPE Quant guidelines is focused in mass spectrometry-based quantification, which is always based on a quantification of the peptides and secondly an inference of the protein quantification values from the peptide ones. In case of spectral counting, the protein quantification value can be calculated directly from the peptide spectrum matchings. We therefore think that it is always necessary to state all the process from peptide quant to protein quant, and other evidences of protein quant values may be out of this guidelines scope. Some experiments that are peptide centric have the conundrum of how to report the quantity of peptide and how that translates to quant of protein. It should be mandatory to report the actual peptide sequence (as well as other characteristics such as Mass, charge, rt) from which the quantitation was carried out. All peptide characteristics that are used to quantify the peptides should be reported in 4.2 section. However, it is true that there is not an explicit requirement for linking quantification values to peptide/protein information. It is just requested to report the input data files where all these data should be there. Therefore, for more clarification, we have added the text: “Report each quantification value together with the appropriate information to identify the feature or peptide that is quantified (peptide sequence or peptide sequence plus the charge state, mass, retention time, etc…).”and also “Report each quantification value together with the appropriate information to identify the protein, such as the protein accession.”. For large scale biomarker discovery projects from body fluids for eg- there is not a lot of concordance at the protein level with the peptides quantitated - possibly due to degradation, protease activity etc. Some of those peptides could be components of other proteins. Reporting the sequence, number of peptides quant is based on is important. The protein quantification values inference from peptide quantification values is explicitly requested in section 4.4.5. In fact, “(especially in case of shared peptides)” is added in the explanation in order to request how the protein inference is performed specially in the case of peptides belonging to different proteins. Also, what to do with 'missing' data points eg spectral counting is an obvious eg. Does '0' mean it is absent or just below level of detection- can other criteria be used to report this type of data? What should be reported ... '0' for other quant data may be able to be obtained from baseline subtraction, transformation, peak areas below the threshold. The approach used should also be reported (within section 4.4). Missing value imputation is requested in section 4.4.1. However, we have added the text “Describe (if appropriate) what does a ‘0’ value means, that is, whether it means absent or just below the level of detection, etc…” Final note: We have changed some names of MIAPE sections in the introduction in concordance with section names in the appendix.

Answers to reviewers_MIAPE_Quant_v0.91

Related documents

Products

Support

Answers to reviewers_MIAPE_Quant_v0.91

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib