Background, Context and Plans

MIBBI: Background, Context and Plans Chris Taylor chrisftaylor@gmail.com http://mibbi.org/ Mechanisms of scientific advance Well-oiled cogs meshing perfectly (would be nice) How well are things working? — Cue the Tower of Babel analogy… — Situation is improving with respect to standards — But few tools, fewer carrots (though some sticks) Why do we care about that..? — Data exchange / deposition  Comprehensibility (/quality) of work  Scope for reuse (parallel or orthogonal) “Publicly-funded research data are a public good, produced in the public interest” “Publicly-funded research data should be openly available to the maximum extent possible.” ProteoRED’s MIAPE satisfaction survey  Spanish multi-site collaboration: provision of proteomics services  MIAPE customer satisfaction survey (compiled November 2008) — http://www.proteored.org/MIAPE_Survey_Results_Nov08.html — Responses from 31 proteomics experts representing 17 labs Yes: 95% No: 5% Modelling the biosciences (inefficiently) Biologically-delineated views of the world A: plant biology B: epidemiology C: microbiology …and… Generic features (‘common core’) — Description of source biomaterial — Experimental design components Arrays Scanning MS Gels Columns Arrays & Scanning MS NMR FTIR Columns Technologically-delineated views of the world A: transcriptomics B: proteomics C: metabolomics …and… ‘Omics’ is about as useful as a chocolate teapot Investigation: Medical syndrome, environmental effect, etc. Study: Toxicology, environmental science, etc. Assay: Omics and miscellaneous techniques Reporting guidelines — a case in point  MIAME, MIAPE, MIAPA, MIACA, MIARE, MIFACE, MISFISHIE, MIGS, MIMIx, MIQAS, MIRIAM, (MIAFGE, MIAO), My Goodness…  ‘MI’ checklists usually developed independently, by groups working within particular biological or technological domains — Difficult to obtain an overview of the full range of checklists — Tracking the evolution of single checklists is non-trivial — Checklists are inevitably partially redundant one against another — Where they overlap arbitrary decisions on wording and sub structuring make integration difficult  Significant difficulties for those who routinely combine information from multiple biological domains and technology platforms — Example: An investigation looking at the impact of toxins on a sentinel species using proteomics (‘eco-toxico-proteomics’) — What reporting standard(s) should they be using? The MIBBI Project (mibbi.org) Granularity Comparison of MIBBI-registered projects [21] (2008-04-10 ) generic organism Medium Fine ● Planned ● Drafting ● Release ● ● cells / microbes plant animal mouse human population environmental sample environment / habitat in silico model study procedures organism maintenance animal husbandry cell / microbe culture plant cultivation acclimation preconditioning / pretreatment ● organism manipulation assay inputs generic study input organism part ● organism state organism trait biomolecule synthetic analyte silencing RNA reagent ● ● MINI ● MIMPP ● MIMIx ● MIGS/MIMS ● MIGen ● MIFlowCyt ● MIARE ● MIAPE [†] ● MIAPA ● MIAME/Tox ● MIAME/Plant ● MIAME/Nutr ● MIAME/Env study design ● MIAME SPECIALISATION study inputs ● MIACA CONCEPT Coarse [†] Denotes that a specification is provided as a suite of related documents ● CIMR [†] Version 0.7 Maturity The MIBBI Project (mibbi.org) The MIBBI Project (mibbi.org) Interaction graph for projects (line thickness & colour saturation show similarity) The MIBBI Project (mibbi.org) Drafting MIBBI Foundry modules Analytical approach proved ‘challenging’  Cross analyses were either too coarse or too depressing  Conclusion: no ‘perfect’ solution… If    in doubt, hack (a.k.a. ‘iterative development’) Start with one set of guidelines, breaking it into ‘paragraphs’ Add another set, breaking it up similarly (‘shared subject’) Where there are overlaps, seek to resolve — If similar, aim for an ‘average’ module — If distinct, use core and extension modules — Record dependencies in a matrix (for reference)  ‘Normalise’ (look for efficiencies, to a point) Validation  Asking for something like MIxxx should get something like MIxxx  Weigh the conflicts/compromises; reexamine extensions etc. Current coverage: Portal versus Foundry Checklists covered to date (x)  MIGS/MIMS, MIAPE, MIFlowCyt, MIARE, ‘Env’ extensions Modules developed to date  35 (set to rise rapidly)… Investigation Column chromatography Study design Nucleic acid sequencing Study overview Mass spectrometry Organism L1 Capillary electrophoresis Organismal genetic component Flow cytometry Cell culture Gel electrophoresis Environment RNAi assay Geographic location Nucleic acid sequencing data processing Sampling event Mass spectrometry informatics Sample description Flow cytometry data analysis Biological sample description Gel informatics Sample size RNAi assay data analysis Sample processing Person or responsible role Date Time Data set RNAi assay data set Reagent Fluorescence reagent Publication Organization Database record Project-specific extensions MIGS Investigation MIGS Organism ‘Pedro’ tool → XML → (via XSLT) Wiki code (etc.) MICheckout: Supporting Users Future direction for MICheckout? Current status  Very simple interface — Pick what you want, in the order you want — Download or view in the format you want  Issues with the current interface — Pick what you want, in the order you want (=anarchy) — No way to work out everything that you need (fiddly bits) Different approaches 1. Wizard-based Q&A for normal users, plus ‘advanced’ interface — Simple ordered (ISA) questions for users; high level concepts — Advanced interface similar to the current one 2. Domain-specific-MI-based concepts as keys/shortcuts — “I normally get MIxxx – please give me the equivalent” — Similar advanced access to #1 http://isa-tools.org Example of guiding the experimentalist to search and select a term from the EnvO ontology, to describe the habitat of a sample (Ontologies, accessed in real time via the Ontology Lookup Service and BioPortal.) 18 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project BII @ the NERC Environmental Bioinformatics Centre ISA Software Users Several groups have now begun to use all or part of the ISA software suite Easy to get going by using the data entry tool alone (ISAcreator) Power users can reconfigure ISAcreator to meet local need (ISAconfigurator) Some skill required to install the full suite (back end stuff) Satisfies two needs: 1. Internal data management 2. Requirement to share data (@ISMB 2011: http://is.gd/biosharing_ISMB_2011) The BioSharing project provides stable web-based catalogues and a user forum. The project seeks to: • Build links between journals, funders and well-constituted standardization efforts in the biosciences; e.g., BMC http://is.gd/WIMqz3 • Expedite the production of an integrated standardsbased framework for the biosciences Coming soon: • IDs/DOIs for all items • Domain-specific views of standards — feedback required: http://is.gd/biosharing_feedback MIBBI and BioSharing: Proposals to PSI BioSharing  Provide/maintain up-to-date information (content)  Offer feedback on the site’s functionality as it matures MIBBI: three options 1. Maintain status quo: MIBBI (and BioSharing) scrape information — Passive participation only; no real impact (or additional benefit) — Draw on MIBBI for description of sample and study context only 2. Use the MIBBI Portal as the source for the most current MIAPE (+?) — MIBBI XML can be transformed into several output types — MIBBI and BioSharing sites increasingly visible to users 3. Participate in the MIBBI Foundry activity (as well as the Portal) — Maintain ‘independent’ MIAPE documents (Portal), but... — Take (joint) ownership of the appropriate Foundry modules — Use the Foundry to re-engineer MIAPE+ where necessary — Show support for integrated cross-domain reporting Acknowledgements MIBBI Chris Taylor (EBI, NEBC), Susanna-Assunta Sansone (U. Oxford), Dawn Field (NEBC), contributions from participants in MIBBI-registered projects. BioSharing Susanna Sansone (U. Oxford), Dawn Field (NEBC), Philippe Rocca-Serra (U. Oxford) Annapaola Santarsiero (Mario Negri Institute; U. Oxford), Eamonn Maguire (U. Oxford), Chris Taylor (EBI, NEBC), contributions from numerous communities and individuals. ISA Infrastructure Susanna-Assunta Sansone, Philippe Rocca-Serra, Eamonn Maguire (U. Oxford); Chris Taylor, Marco Brandizi, Gabriella Rustici, Nataliya Sklyar, Manon Delahaye, Richard Evan (EBI) ; Kimberly Begley, Dorothy Reilly, Oliver Hofmann, Winston Hide (Harvard School of Public Health); Hong Fang, Joshua Xu, Martin Jackson, Jie Zhang, Stephen Harris, Weida Tong (FDA Center for Bioinformatics); Tim Booth, Bela Tiwari, Norman Morrison, Dawn Field (NEBC); Steffen Neumann (Leibniz Institute of Plant Biochemistry); Peter Sterk, Jack Gilbert, Folker Meyer, Linda Amaral-Zettler, Dawn Field (GSC); Alain Zasadzinski, MarieChristine Jacquemot, Florian Mazur, Damien Fleury, Yahia Berchi, Morad Mercheref, Claude Niederlander, Magali Roux (CNRS Institute of Biological Sciences); Audrey Kauffman (Bergonie Cancer Institute); Miroslaw Dylag (Mentor Software Ltd.). Funding NEBC, NERC, BBSRC. The objections to fuller reporting  Why should I dedicate resources to providing data to others? — Pro bono arguments have no impact (altruism is a myth) — Sticks wielded by funders and publishers get the bare minimum — No traceability in most contexts (intellectual property = ?) — Loss of competitive advantage (both direct and indirect)  This is just a ‘make work’ scheme for bioinformaticians — Bioinformaticians get a buzz out of having big databases — Parasites benefitting from others’ work ( mutualism..?)  I don’t trust anyone else’s data — I’d rather repeat work — Problems of quality, which are justified to an extent — But what of people lacking resources or specific expertise?  How on earth am I supposed to do this anyway..? — Perception that there is no money to pay for this — No mature free tools — Excel sheets are no good for HT — Worries about vendor support, legacy systems (business models) Credit where credit’s due  Data sharing is more or less a given now, and tools are emerging — Lots of sticks, but they only get the bare minimum — How to get the best out of data generators? — Need standards- and user-friendly tools, and meaningful credit  Central registries of data sets that can record deposit and reuse — Well-presented, detailed papers get cited more frequently — The same principle should apply to data sets (metadata, etc.) — ORCIDs for people (orcid.org), DOIs for data (datacite.org)  Side-benefits, challenges — Would also clear up problems around paper authorship — Would enable other kinds of credit (training, curation, etc.) — Community policing — researchers ‘own’ their credit portfolio (enforcement body useful, but most likely to be reviewers) — Problem of ‘micro data sets’ and legacy data

Background, Context and Plans

Related documents

Products

Support

Background, Context and Plans

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib