Systems Biology Data Sharing in SysMO-DB

advertisement
SysMO-SEEK: Sharing Data and
Models in Systems Biology
Katy Wolstencroft
Stuart Owen
Jacky Snoep
University of Manchester
SysMO-DB Project
DB

A data access, model handling and data integration
platform for Systems Biology:
To support and manage the diversity of



Data, Models and experimental protocols from a
consortium
Web based
Standards compliant
Systems Biology of Microorganisms
http://www.sysmo.net


Pan European collaboration
13 individual projects, >100 institutes
 Different research outcomes
 A cross-section of microorganisms, incl.
bacteria, archaea and yeast

Record and describe the dynamic
molecular processes occurring in
microorganisms in a comprehensive way

Present these processes in the form of
computerized mathematical models

Pool research capacities and know-how

Already running since April 2007
Runs for 3-5 years
This year, 2 new projects join and 6 leave


Types of data

Multiple omics







Images
Molecular biology
Reaction Kinetics
Models


Metabolic, gene network, kinetic
Relationships between data sets/experiments


genomics, transcriptomics
proteomics, metabolomics
fluxomics, reactomics
Procedures, experiments, data, results and models
Analysis of data
Challenges

Heterogeneous data and models
Distributed groups of researchers
Modellers and experimentalists have different
skills, training, experience
Scientists want to remain in control
Scientists reluctant to share

Social and technical challenges




SysMO-DB Dev Team
Sergejs Aleksejevs
Wolfgang
Müller
Heidelberg
Institute for
Theoretical
Studies
Germany
Carole Goble
Olga Krebs
Katy Wolstencroft
University of Manchester, UK
Stuart Owen
Jacky Snoep
Franco du Preez
University of Stellenbosch,
South Africa
University of Manchester, UK
Finn Bacall
Social Challenge: Focus Group
SysMO PALs
Show what is there
Suggest what is possible
Ask for requirements
Give requirements
Tell priorities
Rate outcomes
Suggest improvements
DB team
Double check
Transmit
Disseminate
Collect answers
Focus Group
Projects
Technical Challenge







Rapid and incremental development
Driven by the PALs
Just enough and just in time , not Just in case
No reinvention
Sustainable and extensible
Migrate to standards
Fitting in with normal lab practices
What do we share
Protocols for Models
Protocol Title
Authors
Keywords
Description
Assumptions
Equations
Numerical Methods/Algorithms
Computational Tools
Parameter Estimation Techniques
Limitations
References
Methods
+
+
Models
Data
All SysMO Assets
+
Results
A Tree View of Assets
Investigation
Studies
SOP
Assay
SOP
ISA infrastructure provides a
directory structure for
experiments
http://isatab.sourceforge.net/
SOP
Construction
Validation
Incentives for sharing




Safe haven for data
Credit and attribution
Help with exporting to public repositories (e.g.
One-click export to ArrayExpress, PRIDE etc)
A repository for “supplementary materials” in
publications


Linking publications and data
Access other resources through a SEEK gateway
Just Enough Sharing
Access
Permissions
...we don’t talk about security
Just Enough sharing
SysMOLab
Wiki
COSMIC
Fetch on
Request
Alfresco
MOSES
Wiki
ANOTHER
Direct
Upload
A DATA
STORE
SOP
How do we share
“Just Enough Results Model”

What type of data is it


What was measured


Microarray, growth curve, enzyme activity…
Gene expression, OD, metabolite concentration….
What do the values in the datasets mean

Units, time series, repeats….
Based on:

Minimum information models
e.g. MIAME, MIAPE, MIRIAM

Biological ontologies
e.g. Gene Ontology, MGED, SBO

Bioportal web service used in SysMO-SEEK for:
Concept lookup and visualisation
How do we share

Share JERM templates developed by SysMO-DB,
PALs and consortium



Spreadsheet templates
Database Schemas
Encourage uptake throughout SysMO



transcriptomics
metabolomics
proteomics etc….
RightField: Annotation by Stealth
Identifying Biological Objects

What do you have in your data?


Where/how do these objects interact?


Proteins/enzymes, genes/expression levels,
metabolites
Pathways, flux, experimental conditions
What models describe these interactions
Possible when using common frameworks,
naming schemes and controlled vocabularies
Following Standards

We recommend formats but we do not enforce
them






Protocols and SOPs – Nature Protocols
Data – JERM models and community minimum
information models
Models – SBML and related standards
Publications – PubMed and DOI
If you follow the prescribed formats, you get
more out, but if you don’t, you can still
participate
Lowering the adoption barrier
SEEK, the eLaboratory
A dynamic resource for analysis as well as browsing




Automatic comparison of data from inside files
Understanding where and how data and models
are linked
Running simulations with new experimental data
Running analyses and workflows over the data
and models
Workflows from myExperiment
Data preparation, annotation and analysis
 Systems Biology workflow Pack on myExperiment
Microarray analysis and text mining

Created by Afsaneh Maleki-Dizaji
from SUMO, University of Sheffield
Based on previous work by Paul
Fisher, University of Manchester
http://www.myexperiment.org/workflows/187
SEEK as a data analysis and
meta analysis service

SBML model construction and population

Calibration workflow
Data requirements




Parameterised SBML model
Experimental data
 Metabolite
concentrations from key
results database
Calibration by COPASI
web service
Peter Li
Data analysis and meta analysis
SEEK Analysis Service with pre-cooked analysis tools.


Calibration workflow
Data requirements



Parameterised SBML model
Experimental data
 Metabolite
concentrations from key
results database
Load model:
Load data:
GO
Calibration by COPASI
web service
Peter Li
Why it works for us



A solution that fits in with current practices
Start simple, show benefits, add more
Engage with the people actually doing the work

PhD students, Post-docs

Build to the PALs requirements
Respect publication cycles
Respect cultural differences

Scientists stay in control


SysMO Methods Spreading

Virtual Liver








Mueller, via HITS
Lungsys
SBCancer
EraSysBio+
Eukaryotic organisms
Interactions between host and pathogen
Human disease
Multi scale modelling
Acknowledgements




SysMO-DB Team
SysMO-PALS
myGrid, Hits and JWS Online
EMBL-EBI, MCISB
http://www.sysmo-db.org
Download