CoLIMS progress Computational Omics and Systems Biology (CompOmics) Group Niels Hulstaert niels.hulstaert@ugent.be outline • predecessor: ms-lims • database schema • architecture • status • in the pipeline • bumpy road • demo ms-lims lifetime growth Millions of spectra 140 120 100% 90% Identification ratio 80% 100 70% 60% 80 50% 60 40% 30% 40 20% 20 0 08-2003 10% 0% 03-2005 10-2006 05-2008 12-2009 07-2011 02-2013 ms-lims usage Format A Agilent HPLC MySQL DB Micromass Q-TOF I Matrix Science Mascot Format B Identification Bruker Ultraflex Format C Bruker Esquire HCT Format D Applied 4X00 MS or MS/MS analysis spectra Consumer 1 Consumer 2 Results interpretation Consumer 3 time for an update • mascot centric • no maxquant support • database schema limitations • hard to maintain legacy code memory issues cyclic dependencies • minimalist gui ms-lims-X -> CoLIMS • take the good things (and start from scratch) rich client straightforward installation lightweight • PeptideShaker support • MaxQuant support • ProteomeXchange/PRIDE support • more mature database schema unique protein sequences unique modifications database schema metadata search input identification results quantification user management architecture architecture storage task server ActiveMQ storage engine in-house client colims-distributed colims-client colims-core colims-distributed colims-repository colims-core colims-model colims-repository database server colims-model colims DB • JMS and JMX java technologies • widely used and has proven to be a stable component in distributed architectures • loose coupling of clients and storage engine • sequential storing: unique protein and modification tables • transactional and retry mechanism quantification status • in progress: MaxQuant import functionality need for validator • in the pipeline Mascot quant support first: mzTab support later: mzQuantML support supported search engines • MaxQuant • In the pipeline: native Mascot support • PeptideShaker: MS-GF+, OMSSA, X!Tandem, MS Amanda and Mascot ProteomeXchange export • PRIDE XML • mzIdentML • PeptideShaker imported data in ProteomeXchange/PRIDE 93 submissions, comprising 11 408 817 spectra 50 submissions are public, containing 3 774 937 spectra 122 675 spectra on average per PeptideShaker project in the pipeline • PeptideShaker like data viewer • data query tool • native ProteomeXchange/PRIDE export (mzML, mzIdentML, mzTab) • built-in distributed search architecture and identification interpretation (SearchGUI/PeptideShaker) • improve client – storage task server interaction • replace ms-lims and import existing data • web interface third party access design bumps • ActiveMQ instead of in-house solution • various database schema changes • auditing issues • unique protein accession -> unique sequence adapting to PeptideShaker • fast release cycles • PSI-MOD -> UNIMOD modifications (multi search engines) • protein inference strategy (protein tree) adapting to MaxQuant • no access to used FASTA • spectral matching across searches • black box DEMO http://colims.googlecode.com