SysMO-DB: Just Enough Exchange for Systems Biology Data and Models Carole Goble, Katy Wolstencroft, Stuart Owen, Sergejs Aleksejevs - University of Manchester Wolfgang Müller, O. Krebs, Isabel Rojas – EML Research gGmbH (=not for profit) Jacky Snoep - University of Stellenbosch MS eScience Workshop, Pittsburgh, PA SysMO=SYStems biology of Micro Organisms 11 projects, 91 partners, 9 countries, started 2007 (4) (9) (1) (2) (22) (29) (2) SysMO-DB Started July 2008, 3 years, 3 staff + 3 investigators, 3 teams over 3 sites Sensitively retrofit a data access, model handling and data integration platform. Support and manage the diversity of data, models and competencies. Web-based solution: exchange of data, models and processes (intraand inter-consortia). search for data, models and processes across the initiative. dissemination of results. SysMO-DB Team EML Research gGmbH, Germany Wolfgang Müller Sergejs Aleksejevs Carole Goble Isabel Rojas Olga Krebs Katy Wolstencroft University of Manchester, UK Stuart Owen Jacky Snoep University of Stellenbosch, South Africa University of Manchester, UK Connect projects, connect to outside Public Outside data and tools SysMO-DB, inter-project Project Project specific solutions Internally used tools & data Personal My Disk: Data Models Workflows Own solutions Own data solutions and collaboration environments. wikis, e-Groupware, PHProject, BaseCamp, PLONE, Alfresco, bespoke commercial … files and spreadsheets. Suspicion Suspicion and caution over sharing. Interesting interplay between modellers, experimentalists and bioinformaticians. Data issues Many do not have data, or follow the standards that exist or know who is doing what. Much of the data cannot be compared Different organisms, different strains. Resource Issues No extra resources for the consortiums 91 institutes, 11 consortiums, some overlapping Principles… Go for a series of small victories Realistic Don‘t reinvent Migrate to standards Sustainable and extensible Provide instant gratification Address doubt and anxiety Build it Experimentalists Three types of people Modellers Bioinformaticians Exchange Exchange Exchange Exchange „Natural“ collaboration within SysMO Short, simplified, black and white: Collaboration during project design Varying methods of collaboration during project Binomes (One modeller, one experimentalist) Groups collaborating with groups (occasional/formalized exchange of information) Varying success Need for a watering hole/meeting point Application where experimentalists/bioinf/ modelers meet ({{flickr| |title=Hot Watering Hole Action |description= |photographer=betty x1138 |photographer_location=NYC, USA |photographer_url=http://flickr.com/photos/98334721@N00 |flickr_url=http://flickr.com/photos/98334721@N00/25901056 |taken=2005-07-14 09:04:32) Trying to make experimentalists, modellers, bioinformaticians peacefully share resources Some numbers & Some consequences 1 Software Engineer 1 Bioinformatician, 1 Bio-database specialist 80-20-rule: 11 projects, 91 partners 80% of the features won‘t be used anyway 20 programmer days/year/project 2.5 programmer days/year/partner 20% “just in case“ approach impossible Useful features Focus on real needs “just in time“, “just enough“ The right 20% Help people help themselves Communication! 80% Social Approach Questionnaires PALs (Project Area Liaison) 21 Postdocs and PhD students Bio/bioinf/modeller Our design and technical collaboration team Very intense face to face and virtual collaboration UK and Continental PALS Chapters Audits and Sharing Methods, data, models, standards, software, schemas, spreadsheets, SOPs….. Communication via PALs Show what is there Suggest what is possible Ask for requirements Double check Transmit Disseminate Give requirements Tell priorities Rate outcomes Suggest improvements DB team Collect answers PALS Projects Outcome of first PALs meeting: Need to find the guy who does xyz: Yellow pages Need to store Standard Operating Procedures Almost all our data is Excel What‘s there SysMO-SEEK screenshots Yellow pages ISA tabs Yellow pages tabs Bookmarks Tag clouds Standard Operation Procedures JWS connection for modellers View Study New Assay (ISA) Rights and sharing Rights and sharing: create group So much for the webapp Rights+Sharing Connection to modelers‘ tools Yellow pages SOPs Almost there: Improved excel support Matthew Horridge Towards Just-Enough Exchange Incremental steps from beta to beta Towards Just-Enough Exchange Largely a story about how to handle Excel sheets for user‘s benefits SysMO Just Enough Exchange SysMO-LAB BaCell-SysMO Wiki Alfresco Spread sheets Spread sheets SABIO-RK COSMIC Wiki Alfresco Spread sheets MOSES Spread sheets SABIO-RK BASE Public Resources Need for tradeoff Huge number of systems Huge number of standards (MIBBI, OBO…) Some of them big standards Too much to cope with a few people, but: Comparison needs standardisation Search needs standardisation Need to move incrementally to just-enough standard implementation Path = goal The journey is part of the reward Let people use what they use anyway If changes necessary, be as unintrusive as possible Be aware of legacy data Nudge people towards best practises Give instantly useful added value to as many users as possible: Simple search, simple exchange, simple tool use A roadmap Provide convincing Web 2.0 functionality for use and as appetizer Yellow pages SOPs Upload service: Hand-triggered upload of link/file Hand-added metadata Harvesting+change detection service Automatic download Hand-added metadata Support for Excel templates Promote internal standards by use + tooling Mappers + parsers Classifiers Use other data types where appropriate SBML, Matlab, Mathematica… Stability hierarchy Use mappers where needed Project-level template Single group Single SysMO project Enter into that More stable JERM data model Template best practise Whole SysMO Increasing stability Parsers/ annotators Template for a group of experiments Data Metad. JERM Extraction Architecture Parser Extractor Extractor Mapper Mapper Data Metad. Classifier/Dispatcher Template recognizer Template recognizer Data Metad. Harvester Data handler Data handler Data Project repositories Oops Some projects not prolonged Need all project data in the system fast, so… Data Metad. JERM Extraction Architecture Parser Extractor Extractor Mapper Mapper Data Metad. Classifier/Dispatcher Template recognizer Template recognizer Data Metad. Harvester Data handler Data handler Data Project repositories Lessons we‘re learning Some interesting bits along the way Subsetting: Don‘t overwhelm Standards need to be comprehensive Goal: „Minimum information“… (MIBBI) Tends to be superset of what is needed for a project Example for non-applicable attributes Tissue of a single cell Gender Useful to use adapted subset-templates Experimental design selection list From biofolksonomy to ontology Observation: Fast growing set of standards Standards are moving target Incremental approach Tags + suggestions Keyword annotation Controlled selection lists Home-brewed taxonomies Use/contribution to standard ontologies Provide migration tools Home-brewed taxonomy A word on software Template tooling Excel JAVA SysMO-SEEK (open source under Apache license) Ruby on Rails Libraries & plugins Convention over configuration Rails specific (e.g. acts_as_authenticated) SOLR & Lucene introduce JAVA/Ruby Database: MySQL also tested with SQLite (exclude db depedencies) Summary SysMO-DB as a virtual meeting point for different flavours of systems biologists SysMO-DB‘s mantra: Just enough just in time Flexible JERM extracture architecture Just enough metadata (incremental) Lot done still a lot todo Challenges ahead… Social PALs work great and motivated Now need moremoremore datadatadata Technical Publishing into public repositories Search + exploration: The test for data quality Hierarchical Faceted Search Distributed search via Taverna workflows More workflows via SysMO-SEEK Improve modelling support Bonus track: what if… …the average data quality is below par? „Nagging functionality“ Remind people of potentially faulty metadata Give suggestions what to improve and how Give possibility to create automatic mappings Thanks EML People: UMAN People: Isabel Olga Carole Katy Finn Stuart Sergejs Jacky at Stellenbosch BBSRC BMBF KTF …and Microsoft for sponsoring this workshop www.sysmo-db.org End + questons END