SysMO-SEEK: Sharing Data and Models in Systems Biology Katy Wolstencroft Stuart Owen Jacky Snoep University of Manchester SysMO-DB Project DB A data access, model handling and data integration platform for Systems Biology: To support and manage the diversity of Data, Models and experimental protocols from a consortium Web based Standards compliant Systems Biology of Microorganisms http://www.sysmo.net Pan European collaboration 13 individual projects, >100 institutes Different research outcomes A cross-section of microorganisms, incl. bacteria, archaea and yeast Record and describe the dynamic molecular processes occurring in microorganisms in a comprehensive way Present these processes in the form of computerized mathematical models Pool research capacities and know-how Already running since April 2007 Runs for 3-5 years This year, 2 new projects join and 6 leave Types of data Multiple omics Images Molecular biology Reaction Kinetics Models Metabolic, gene network, kinetic Relationships between data sets/experiments genomics, transcriptomics proteomics, metabolomics fluxomics, reactomics Procedures, experiments, data, results and models Analysis of data Challenges Heterogeneous data and models Distributed groups of researchers Modellers and experimentalists have different skills, training, experience Scientists want to remain in control Scientists reluctant to share Social and technical challenges SysMO-DB Dev Team Sergejs Aleksejevs Wolfgang Müller Heidelberg Institute for Theoretical Studies Germany Carole Goble Olga Krebs Katy Wolstencroft University of Manchester, UK Stuart Owen Jacky Snoep Franco du Preez University of Stellenbosch, South Africa University of Manchester, UK Finn Bacall Social Challenge: Focus Group SysMO PALs Show what is there Suggest what is possible Ask for requirements Give requirements Tell priorities Rate outcomes Suggest improvements DB team Double check Transmit Disseminate Collect answers Focus Group Projects Technical Challenge Rapid and incremental development Driven by the PALs Just enough and just in time , not Just in case No reinvention Sustainable and extensible Migrate to standards Fitting in with normal lab practices What do we share Protocols for Models Protocol Title Authors Keywords Description Assumptions Equations Numerical Methods/Algorithms Computational Tools Parameter Estimation Techniques Limitations References Methods + + Models Data All SysMO Assets + Results A Tree View of Assets Investigation Studies SOP Assay SOP ISA infrastructure provides a directory structure for experiments http://isatab.sourceforge.net/ SOP Construction Validation Incentives for sharing Safe haven for data Credit and attribution Help with exporting to public repositories (e.g. One-click export to ArrayExpress, PRIDE etc) A repository for “supplementary materials” in publications Linking publications and data Access other resources through a SEEK gateway Just Enough Sharing Access Permissions ...we don’t talk about security Just Enough sharing SysMOLab Wiki COSMIC Fetch on Request Alfresco MOSES Wiki ANOTHER Direct Upload A DATA STORE SOP How do we share “Just Enough Results Model” What type of data is it What was measured Microarray, growth curve, enzyme activity… Gene expression, OD, metabolite concentration…. What do the values in the datasets mean Units, time series, repeats…. Based on: Minimum information models e.g. MIAME, MIAPE, MIRIAM Biological ontologies e.g. Gene Ontology, MGED, SBO Bioportal web service used in SysMO-SEEK for: Concept lookup and visualisation How do we share Share JERM templates developed by SysMO-DB, PALs and consortium Spreadsheet templates Database Schemas Encourage uptake throughout SysMO transcriptomics metabolomics proteomics etc…. RightField: Annotation by Stealth Identifying Biological Objects What do you have in your data? Where/how do these objects interact? Proteins/enzymes, genes/expression levels, metabolites Pathways, flux, experimental conditions What models describe these interactions Possible when using common frameworks, naming schemes and controlled vocabularies Following Standards We recommend formats but we do not enforce them Protocols and SOPs – Nature Protocols Data – JERM models and community minimum information models Models – SBML and related standards Publications – PubMed and DOI If you follow the prescribed formats, you get more out, but if you don’t, you can still participate Lowering the adoption barrier SEEK, the eLaboratory A dynamic resource for analysis as well as browsing Automatic comparison of data from inside files Understanding where and how data and models are linked Running simulations with new experimental data Running analyses and workflows over the data and models Workflows from myExperiment Data preparation, annotation and analysis Systems Biology workflow Pack on myExperiment Microarray analysis and text mining Created by Afsaneh Maleki-Dizaji from SUMO, University of Sheffield Based on previous work by Paul Fisher, University of Manchester http://www.myexperiment.org/workflows/187 SEEK as a data analysis and meta analysis service SBML model construction and population Calibration workflow Data requirements Parameterised SBML model Experimental data Metabolite concentrations from key results database Calibration by COPASI web service Peter Li Data analysis and meta analysis SEEK Analysis Service with pre-cooked analysis tools. Calibration workflow Data requirements Parameterised SBML model Experimental data Metabolite concentrations from key results database Load model: Load data: GO Calibration by COPASI web service Peter Li Why it works for us A solution that fits in with current practices Start simple, show benefits, add more Engage with the people actually doing the work PhD students, Post-docs Build to the PALs requirements Respect publication cycles Respect cultural differences Scientists stay in control SysMO Methods Spreading Virtual Liver Mueller, via HITS Lungsys SBCancer EraSysBio+ Eukaryotic organisms Interactions between host and pathogen Human disease Multi scale modelling Acknowledgements SysMO-DB Team SysMO-PALS myGrid, Hits and JWS Online EMBL-EBI, MCISB http://www.sysmo-db.org