PSI-ProteomeInformaticsUpdate

advertisement
PSI-Proteome Informatics update
Andy Jones
PSI 2013 Liverpool
PSI-PI outputs
• Formats and guidelines for proteome informatics
• Standard formats:
– mzIdentML
– mzQuantML
– mzTab
• Reporting guidelines
– MIAPE MSI
– MIAPE Quant
MIAPE documents
• Originally one MIAPE document:
– MIAPE Mass spectrometry information (MSI) containing
both identification guidelines and quant guidelines
– MIAPE MSI (ident only) and MIAPE Quant
• MIAPE MSI status
– MIAPE MSI 1.1 published back in 2008
– Working group 2011-2012 minor updates to requirements
and removal of quant parts
– MIAPE MSI 1.2 still needs to be re-submitted to PSI process
• Plan for meeting:
– Final issues with MIAPE MSI and alignment with
mzIdentML?
MIAPE Quant timeline
•
Work started on Dec 2010 by ProteoRed groups of experts
•
Shared with PSI working groups in March 2011
•
Revision at PSI meeting (Heidelberg) April 2011
•
PSI review:
– Public and external review ended on August 2012
– Major revision accepted on October 2012
•
Journal of Proteomics:
– Submitted on 15th February
– Accepted on 27th February after minor revision
Martínez-Bartolomé, S., Deutsch, E. W., Binz, P.-A., Jones, A. R., Eisenacher, M., Mayer, G.,
Campos, A., Canals, F., Bech-Serra, J.-J., Carrascal, M., Gay, M., Paradela, A., Navajas, R.,
Marcilla, M., Hernáez, M. L., Gutiérrez-Blázquez, M. D., Velarde, L. F. C., Aloria, K.,
Beaskoetxea, J., Medina-Aunon, J. A., and Albar, J. P. Guidelines for reporting quantitative mass
spectrometry based experiments in proteomics. Journal of Proteomics, 2013 in press.
http://www.sciencedirect.com/science/article/pii/S1874391913001024
No planned work for meeting
mzIdentML
• Timeline:
– Original 1.0 version in Aug 2009
– Version 1.1 stable (Aug 2011)
– Manuscript published in MCP in 2012
• PSI 2013 To do list:
– Updates to protein grouping
– PTM localisation / ambiguity scoring
– General discussion of data compression issues
Jones, A. R., Eisenacher, M., Mayer, G., Kohlbacher, O., Siepen, J., Hubbard, S., Selley, J., Searle, B., Shofstahl, J., Seymour, S., Julian, R.,
Binz, P.-A., Deutsch, E. W., Hermjakob, H., Reisinger, F., Griss, J., Vizcaino, J. A., Chambers, M., Pizarro, A., and Creasy, D. (2012) The
mzIdentML data standard for mass spectrometry-based proteomics results. Molecular & Cellular Proteomics 11, M111.014381.
Tooling for mzIdentML
Tool
IMPORT/EXPORT
OMSSA
MSGF+
Peaks
Status
mzIdentML Version 1.0 available in Mascot version 2.3, mzid stable version 1.1 available in
Mascot version 2.4
http://code.google.com/p/mzidentml-lib/ from the U.Liverpool group
Full support for results into mzid 1.1
Native export of mzIdentML version 1.1
Phenyx
Exporter to mzIdentML v1.1 now available - contact GeneBio for details.
EXPORT
Mascot
EXPORT
EXPORT
EXPORT
EXPORT
PLGS
Conversion of SEQUEST *.out into mzIdentML (SpectrumIdentificationResults only); Conversion
EXPORT
of ProteomeDiscoverer 1.2 + 1.3 (Thermo) *.msf and *.prot.xml files into mzIdentML
ProteinPilot
Contact is Sean Seymour
ProteinScape
Work in progress
SEQUEST - Native
ProCon (see above)
pepXML converter available now - impl. of C++ library for reading/writing MzIdentML / interface
ProteoWizard
IMPORT AND EXPORT
for importing other formats
SEQUEST - BioWorks
Work in progress
SEQUEST - Proteome Discoverer Work in progress (exporters available from ProCon project)
SpectraST
Proteowizard conversion from pepXML
EXPORT
Spectrum Mill
X!Tandem
http://code.google.com/p/mzidentml-lib/
EXPORT
OpenMS
Fully supported in release 1.9
IMPORT AND EXPORT
Scaffold
Available now in Scaffold version 3.0
EXPORT
Scaffold PTM
Scaffold PTM tool imports identifications in mzIdentML
IMPORT
ProCon
TPP
pepXML to mzid from ProteoWizard
IMPORT AND EXPORT
MIAPE MSI Extractor
Tool available from the ProteoRed team
IMPORT
CSV exporter
PAAnalyzer
Myrimatch
TagRecon
Pepitome
IDPicker
jmzIdentML
http://code.google.com/p/mzidentml-lib/
Imports and exports mzIdentML (only v1.0)
Identifications exported in mzIdentML
Identifications exported in mzIdentML
Identifications exported in mzIdentML
Version 3.x implements mzIdentML import
Java API for reading and writing mzIdentML
IMPORT
IMPORT AND EXPORT
EXPORT
EXPORT
EXPORT
IMPORT
IMPORT AND EXPORT
Formats
• mzQuantML
– Output of quantitative software
– Quantitative values about proteins, protein
groups, peptides and features (quantified regions
on mass spec) also small molecules...
• Relative or absolute values for single samples (Assays)
or groups of replicates (StudyVariables)
mzQuantML status
•
•
•
Version 1.0 rc-1 submitted to the PSI process October 2011
Version 1.0 rc-2 June 2012
Re-submitted to PSI process in October 2012 & manuscript submitted to MCP, minor correction received
Completed PSI process in Feb 2013 – version 1.0 release
–
–
–
Supports label-free (intensity), label-free (spectral counting), MS2 tag techniques (e.g. iTRAQ) and MS1 label
techniques e.g. SILAC
Schema is fixed with each technique defined by separate semantic rules, implemented in validator software
Manuscript re-submitted to MCP, awaiting outcome
Implementations
• Java API for creating example files (version 1.0 release): http://code.google.com/p/jmzquantml/
• Java-based validator (version 1.0 release): http://code.google.com/p/mzquantml-validator/
• Software for converting output files from MaxQuant and Progenesis:
–
•
•
•
Qi, D et al. OMICS 16(9): 489-495 ; http://code.google.com/p/maxquant-mzquantml-convertor/
Implementation in OpenMS for some techniques
Beta Java library of routines inc. mzTab exporter: http://code.google.com/p/mzq-lib/
Beta Excel to mzQuantML converter for spectral count data: http://code.google.com/p/tsv-or-csvmzquantml-converter/
Mzq To do list:
• Need to add SRM support
–
–
•
Local testing of SRM encoding and conversion from Skyline
Need wider input on our mapping and writing semantic rules for software
Need to check whether protein grouping and mod scoring map onto format okay
mzTab
•
To provide a simple and efficient way of exchanging results from MS approaches.
– Simple summary “final” report of the experimental results; Peptides and proteins identified and
quantified
– Small molecules included (metabolomics)
– Technical and biological metadata
– Spectra can be referenced in optional columns.
– Set of mandatory and optional attributes (very flexible).
•
Four sections:
– (Optional) Metadata section
– (Optional) Protein section
– (Optional) Peptide section
– (Optional) Small Molecule section (metabolomics)
•
Can report MS derived data at different levels:
– Single experiments
– Multiple (possibly linked) experiments (merged files)
– Data generated as a result of a query to a bioinformatics resource
– Possible to add a reliability score for each identification
•
Easy to parse and use by the research community, systems biologists as well as providers of knowledge
bases.
It can be used by non-experts in bioinformatics and/or proteomics.
•
http://mztab.googlecode.com
mzTab status
• Submitted to the PSI document process on May 2012.
• TO DO: Addressing now the remaining (minor) comments after the second
round of review.
• So, we hope that version 1.0 will soon be formalised.
• Publication (revised version) under review in MCP.
Current implementations:
– jmzTab (Java API): 2 versions have been developed. Version 2.0 (Q.W. Xu, about to be
finished, with more functionality) is going to be the maintained version. Version 1.0 (J.
Griss) will not be further maintained.
– mzTab Validator, PRIDE XML to mzTab converter and mzTab merger in beta status.
– PRIDE Converter 2.
– OpenMS (version 1.10)
– R/Bioconductor package Msnbase (L. Gatto, Cambridge University)
– LipidDataAnalyzer (University of Graz)
– Metabolights (EBI) and COSMOS EU project: A slightly modified version is being used
right now. Working in contact with them.
• End-meeting summary...
PSI-PI work done
• mzIdentML
– Minor schema issues:
• Optional attribute (Dbsequence_Ref) on ProteinDetectionHypothesis (would be better if
mandatory)
– update spec doc encouraging best practice
• Pre-fractionation:
– update spec doc encouraging best practice – one SpectrumIdentificationList where possible
• Retention time reporting:
– Update spec doc encouraging best practice; align with mzML CVs
– Support for Crosslinking results
• Sketched a possible reporting format that looks to cover most simple cases
• Needs (considerable) further testing in local implementations and follow up by calls
– Mod localisation
• Sketched some possible encodings
• Needs follow up calls and implementation in software
• Keen to build this support into mzid 1.1 but model is going to be a work-around.
– Protein grouping
• Reported back current progress of working group (key members not present here)
• New members will join the working group
PSI-PI work done
• mzQuantML
– Sketched SRM example files for label-based
encoding
– Need sketched example for label-free but seems
straightforward
– Plan to build export software very soon from
Skyline (prototype already done) and mProphet
– Write up semantic encoding rules
– Submit to PSI doc process as a Community
Practice document
PSI-PI work done
mzTab
• Finalised minor issues from second round of
PSI doc process review
• Deadline 1st May for re-submitting final
“release 1.0” document
– Needs minor updates to document and example
files
Download