Just enough exchange for Systems Biology Data and Models.

advertisement
SysMO-DB:
Just Enough Exchange for Systems
Biology Data and Models
Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of
Manchester
Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit)
Jacky Snoep - University of Stellenbosch
MS eScience Workshop, Pittsburgh, PA
SysMO=SYStems biology
of Micro Organisms
11 projects, 91 partners, 9 countries, started 2007
(4)
(9)
(1)
(2)
(22)
(29)
(2)
SysMO-DB
 Started
July 2008, 3 years, 3 staff + 3
investigators, 3 teams over 3 sites
 Sensitively retrofit a data access, model
handling and data integration platform.
 Support and manage the diversity of
data, models and competencies.
 Web-based solution:
exchange of data, models and processes (intraand inter-consortia).
 search for data, models and processes across
the initiative.
 dissemination of results.

SysMO-DB Team
EML Research
gGmbH,
Germany
Wolfgang
Müller
Sergejs Aleksejevs
Carole Goble
Isabel Rojas Olga Krebs
Katy Wolstencroft
University of Manchester, UK
Stuart Owen
Jacky Snoep
University of Stellenbosch, South Africa
University of Manchester, UK
Connect projects, connect to outside
Public
Outside data and tools
SysMO-DB, inter-project
Project
Project specific solutions
Internally used tools & data
Personal
My Disk: Data
Models
Workflows
Own solutions
Own data solutions and collaboration
environments. wikis, e-Groupware, PHProject,
BaseCamp, PLONE, Alfresco, bespoke
commercial … files and spreadsheets.
Suspicion
Suspicion and caution over sharing.
Interesting interplay between modellers,
experimentalists and bioinformaticians.
Data issues
Many do not have data, or follow the standards
that exist or know who is doing what.
Much of the data cannot be compared
Different organisms, different strains.
Resource Issues
No extra resources for the consortiums
91 institutes, 11 consortiums, some overlapping
Principles…








Go for a series of small victories
Realistic
Don‘t reinvent
Migrate to standards
Sustainable and extensible
Provide instant gratification
Address doubt and anxiety
Build it
Experimentalists
Three types of people
Modellers
Bioinformaticians
Exchange
Exchange
Exchange
Exchange
„Natural“ collaboration within SysMO
Short, simplified, black and white:
 Collaboration during
project design
 Varying methods of
collaboration during project




Binomes (One modeller, one
experimentalist)
Groups collaborating with
groups (occasional/formalized
exchange of information)
Varying success
Need for a watering
hole/meeting point

Application where
experimentalists/bioinf/
modelers meet
({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138
|photographer_location=NYC, USA
|photographer_url=http://flickr.com/photos/98334721@N00
|flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14
09:04:32)
Trying to make experimentalists,
modellers, bioinformaticians
peacefully share resources
Some numbers
& Some consequences
1 Software Engineer 1 Bioinformatician,
1 Bio-database specialist
80-20-rule:
 11 projects, 91 partners
80% of the features
won‘t be used anyway
 20 programmer days/year/project
 2.5 programmer days/year/partner
20%
 “just in case“ approach impossible

Useful features
Focus on real needs
 “just in time“, “just enough“
 The right 20%
 Help people help themselves
 Communication!

80%
Social Approach


Questionnaires
PALs (Project Area Liaison)






21 Postdocs and PhD students
Bio/bioinf/modeller
Our design and technical
collaboration team
Very intense face to face and
virtual collaboration
UK and Continental PALS
Chapters
Audits and Sharing

Methods, data, models,
standards, software, schemas,
spreadsheets, SOPs…..
Communication via PALs
Show what is there
Suggest what is possible
Ask for requirements
Double check
Transmit
Disseminate
Give requirements
Tell priorities
Rate outcomes
Suggest improvements
DB team
Collect answers
PALS
Projects
Outcome of first PALs meeting:

Need to find the guy who
does xyz: Yellow pages

Need to store
Standard Operating
Procedures

Almost all our data is
Excel
What‘s there
SysMO-SEEK screenshots
Yellow pages
ISA tabs
Yellow
pages tabs
Bookmarks
Tag clouds
Standard Operation Procedures
JWS connection for modellers
View Study
New Assay (ISA)
Rights and sharing
Rights and sharing: create group
So much for the webapp
Rights+Sharing
Connection to modelers‘ tools
Yellow pages
SOPs
Almost there: Improved excel support
Matthew Horridge
Towards Just-Enough
Exchange
Incremental steps from
beta to beta
Towards Just-Enough
Exchange
Largely a story about how to handle
Excel sheets for user‘s benefits
SysMO Just Enough Exchange
SysMO-LAB
BaCell-SysMO
Wiki
Alfresco
Spread
sheets
Spread
sheets
SABIO-RK
COSMIC
Wiki
Alfresco
Spread
sheets
MOSES Spread
sheets
SABIO-RK
BASE
Public
Resources
Need for tradeoff



Huge number of systems
Huge number of standards (MIBBI, OBO…)
Some of them big standards
Too much to cope with a few people, but:
 Comparison needs standardisation
 Search needs standardisation
 Need to move incrementally to just-enough
standard implementation
Path = goal
The journey is part of the reward





Let people use what they use anyway
If changes necessary,
be as unintrusive as possible
Be aware of legacy data
Nudge people towards best practises
Give instantly useful added value to as many
users as possible:
Simple search, simple exchange, simple tool use
A roadmap





Provide convincing Web 2.0 functionality for use and as appetizer
 Yellow pages
 SOPs
Upload service:
 Hand-triggered upload of link/file
 Hand-added metadata
Harvesting+change detection service
 Automatic download
 Hand-added metadata
Support for Excel templates
 Promote internal standards by use + tooling
 Mappers + parsers
 Classifiers
Use other data types where appropriate
 SBML, Matlab, Mathematica…
Stability hierarchy
Use mappers
where needed
Project-level
template
Single group
Single SysMO
project
Enter into that
More stable JERM data model
Template best practise
Whole
SysMO
Increasing stability
Parsers/
annotators
Template for a
group of experiments
Data
Metad.
JERM Extraction Architecture
Parser
Extractor
Extractor
Mapper
Mapper
Data
Metad.
Classifier/Dispatcher
Template recognizer
Template recognizer
Data
Metad.
Harvester
Data handler
Data handler
Data
Project repositories
Oops
Some projects not prolonged
Need all project data in the system fast,
so…
Data
Metad.
JERM Extraction Architecture
Parser
Extractor
Extractor
Mapper
Mapper
Data
Metad.
Classifier/Dispatcher
Template recognizer
Template recognizer
Data
Metad.
Harvester
Data handler
Data handler
Data
Project repositories
Lessons we‘re learning
Some interesting bits along the way
Subsetting: Don‘t overwhelm





Standards need to be
comprehensive
Goal: „Minimum
information“… (MIBBI)
Tends to be superset of what
is needed for a project
Example for non-applicable
attributes
 Tissue of a single cell
 Gender
Useful to use adapted
subset-templates
Experimental
design selection list
From biofolksonomy to ontology

Observation:



Fast growing set of
standards
Standards are moving
target
Incremental approach





Tags + suggestions
Keyword annotation
Controlled selection lists
Home-brewed taxonomies
Use/contribution to
standard ontologies
Provide migration tools
Home-brewed
taxonomy
A word on software

Template tooling



Excel
JAVA
SysMO-SEEK (open source under Apache license)

Ruby on Rails


Libraries & plugins



Convention over configuration
Rails specific (e.g. acts_as_authenticated)
SOLR & Lucene introduce JAVA/Ruby
Database:
MySQL also tested with SQLite
(exclude db depedencies)
Summary





SysMO-DB as a virtual meeting point for different
flavours of systems biologists
SysMO-DB‘s mantra: Just enough just in time
Flexible JERM extracture architecture
Just enough metadata (incremental)
Lot done  still a lot todo 
Challenges ahead…

Social



PALs work great and motivated
Now need moremoremore datadatadata
Technical


Publishing into public repositories
Search + exploration: The test for data quality




Hierarchical Faceted Search
Distributed search via Taverna workflows
More workflows via SysMO-SEEK
Improve modelling support
Bonus track: what if…
…the average data quality is below par?
 „Nagging



functionality“
Remind people of potentially faulty metadata
Give suggestions what to improve and how
Give possibility to create automatic mappings
Thanks

EML People:





UMAN People:






Isabel
Olga

Carole
Katy
Finn
Stuart
Sergejs
Jacky at Stellenbosch

BBSRC
BMBF
KTF
…and Microsoft for
sponsoring this workshop
www.sysmo-db.org
End + questons
END
Download