Background, Context and Plans

advertisement
MIBBI: Background,
Context and Plans
Chris Taylor
chrisftaylor@gmail.com
http://mibbi.org/
Mechanisms of scientific advance
Well-oiled cogs meshing perfectly (would be nice)
How well are things working?
— Cue the Tower of Babel analogy…
— Situation is improving with respect to standards
— But few tools, fewer carrots (though some sticks)
Why do we care about that..?
— Data exchange / deposition
 Comprehensibility (/quality) of work
 Scope for reuse (parallel or orthogonal)
“Publicly-funded research data are a public
good, produced in the public interest”
“Publicly-funded research data should be openly
available to the maximum extent possible.”
ProteoRED’s MIAPE satisfaction survey
 Spanish multi-site collaboration: provision of proteomics services
 MIAPE customer satisfaction survey (compiled November 2008)
— http://www.proteored.org/MIAPE_Survey_Results_Nov08.html
— Responses from 31 proteomics experts representing 17 labs
Yes: 95%
No: 5%
Modelling the biosciences (inefficiently)
Biologically-delineated
views of the world
A: plant biology
B: epidemiology
C: microbiology
…and…
Generic features (‘common core’)
— Description of source biomaterial
— Experimental design components
Arrays
Scanning
MS
Gels
Columns
Arrays &
Scanning
MS
NMR
FTIR
Columns
Technologically-delineated
views of the world
A: transcriptomics
B: proteomics
C: metabolomics
…and…
‘Omics’ is about as useful as a chocolate teapot
Investigation: Medical syndrome, environmental effect, etc.
Study: Toxicology, environmental science, etc.
Assay: Omics and miscellaneous techniques
Reporting guidelines — a case in point
 MIAME, MIAPE, MIAPA, MIACA, MIARE, MIFACE, MISFISHIE, MIGS,
MIMIx, MIQAS, MIRIAM, (MIAFGE, MIAO), My Goodness…
 ‘MI’ checklists usually developed independently, by groups working
within particular biological or technological domains
— Difficult to obtain an overview of the full range of checklists
— Tracking the evolution of single checklists is non-trivial
— Checklists are inevitably partially redundant one against another
— Where they overlap arbitrary decisions on wording and sub
structuring make integration difficult
 Significant difficulties for those who routinely combine information
from multiple biological domains and technology platforms
— Example: An investigation looking at the impact of toxins on a
sentinel species using proteomics (‘eco-toxico-proteomics’)
— What reporting standard(s) should they be using?
The MIBBI Project (mibbi.org)
Granularity
Comparison of MIBBI-registered projects [21]
(2008-04-10 )
generic organism
Medium
Fine
● Planned
● Drafting
● Release
●
●
cells / microbes
plant
animal
mouse
human
population
environmental sample
environment / habitat
in silico model
study procedures
organism maintenance
animal husbandry
cell / microbe culture
plant cultivation
acclimation
preconditioning / pretreatment
●
organism manipulation
assay inputs
generic study input
organism part
●
organism state
organism trait
biomolecule
synthetic analyte
silencing RNA reagent
●
● MINI
● MIMPP
● MIMIx
● MIGS/MIMS
● MIGen
● MIFlowCyt
● MIARE
● MIAPE [†]
● MIAPA
● MIAME/Tox
● MIAME/Plant
● MIAME/Nutr
● MIAME/Env
study design
● MIAME
SPECIALISATION
study inputs
● MIACA
CONCEPT
Coarse
[†] Denotes that a specification is provided as a suite of related documents
● CIMR [†]
Version 0.7
Maturity
The MIBBI Project (mibbi.org)
The MIBBI Project (mibbi.org)
Interaction graph for projects (line thickness & colour saturation show similarity)
The MIBBI Project (mibbi.org)
Drafting MIBBI Foundry modules
Analytical approach proved ‘challenging’
 Cross analyses were either too coarse or too depressing
 Conclusion: no ‘perfect’ solution…
If



in doubt, hack (a.k.a. ‘iterative development’)
Start with one set of guidelines, breaking it into ‘paragraphs’
Add another set, breaking it up similarly (‘shared subject’)
Where there are overlaps, seek to resolve
— If similar, aim for an ‘average’ module
— If distinct, use core and extension modules
— Record dependencies in a matrix (for reference)
 ‘Normalise’ (look for efficiencies, to a point)
Validation
 Asking for something like MIxxx should get something like MIxxx
 Weigh the conflicts/compromises; reexamine extensions etc.
Current coverage: Portal versus Foundry
Checklists covered to date (x)
 MIGS/MIMS, MIAPE, MIFlowCyt, MIARE, ‘Env’ extensions
Modules developed to date
 35 (set to rise rapidly)…
Investigation
Column chromatography
Study design
Nucleic acid sequencing
Study overview
Mass spectrometry
Organism L1
Capillary electrophoresis
Organismal genetic component Flow cytometry
Cell culture
Gel electrophoresis
Environment
RNAi assay
Geographic location
Nucleic acid sequencing data
processing
Sampling event
Mass spectrometry informatics
Sample description
Flow cytometry data analysis
Biological sample description
Gel informatics
Sample size
RNAi assay data analysis
Sample processing
Person or responsible role
Date
Time
Data set
RNAi assay data set
Reagent
Fluorescence reagent
Publication
Organization
Database record
Project-specific extensions
MIGS Investigation
MIGS Organism
‘Pedro’ tool → XML → (via XSLT) Wiki code (etc.)
MICheckout: Supporting Users
Future direction for MICheckout?
Current status
 Very simple interface
— Pick what you want, in the order you want
— Download or view in the format you want
 Issues with the current interface
— Pick what you want, in the order you want (=anarchy)
— No way to work out everything that you need (fiddly bits)
Different approaches
1. Wizard-based Q&A for normal users, plus ‘advanced’ interface
— Simple ordered (ISA) questions for users; high level concepts
— Advanced interface similar to the current one
2. Domain-specific-MI-based concepts as keys/shortcuts
— “I normally get MIxxx – please give me the equivalent”
— Similar advanced access to #1
http://isa-tools.org
Example of guiding the experimentalist to search and select a term from
the EnvO ontology, to describe the habitat of a sample
(Ontologies, accessed in real time via the Ontology Lookup Service and BioPortal.)
18
The International Conference on Systems Biology (ICSB), 22-28 August, 2008
Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
BII @ the NERC Environmental Bioinformatics Centre
ISA Software Users
Several groups have now
begun to use all or part of
the ISA software suite
Easy to get going by using
the data entry tool alone
(ISAcreator)
Power users can reconfigure
ISAcreator to meet local need
(ISAconfigurator)
Some skill required to install
the full suite (back end stuff)
Satisfies two needs:
1. Internal data management
2. Requirement to share data
(@ISMB 2011: http://is.gd/biosharing_ISMB_2011)
The BioSharing project provides stable
web-based catalogues and a user
forum. The project seeks to:
• Build links between journals,
funders and well-constituted
standardization efforts in
the biosciences; e.g., BMC
http://is.gd/WIMqz3
• Expedite the production of
an integrated standardsbased framework for the
biosciences
Coming soon:
• IDs/DOIs for all items
• Domain-specific views of
standards — feedback required:
http://is.gd/biosharing_feedback
MIBBI and BioSharing: Proposals to PSI
BioSharing
 Provide/maintain up-to-date information (content)
 Offer feedback on the site’s functionality as it matures
MIBBI: three options
1. Maintain status quo: MIBBI (and BioSharing) scrape information
— Passive participation only; no real impact (or additional benefit)
— Draw on MIBBI for description of sample and study context only
2. Use the MIBBI Portal as the source for the most current MIAPE (+?)
— MIBBI XML can be transformed into several output types
— MIBBI and BioSharing sites increasingly visible to users
3. Participate in the MIBBI Foundry activity (as well as the Portal)
— Maintain ‘independent’ MIAPE documents (Portal), but...
— Take (joint) ownership of the appropriate Foundry modules
— Use the Foundry to re-engineer MIAPE+ where necessary
— Show support for integrated cross-domain reporting
Acknowledgements
MIBBI
Chris Taylor (EBI, NEBC), Susanna-Assunta Sansone (U. Oxford), Dawn Field (NEBC),
contributions from participants in MIBBI-registered projects.
BioSharing
Susanna Sansone (U. Oxford), Dawn Field (NEBC), Philippe Rocca-Serra (U. Oxford)
Annapaola Santarsiero (Mario Negri Institute; U. Oxford), Eamonn Maguire (U. Oxford),
Chris Taylor (EBI, NEBC), contributions from numerous communities and individuals.
ISA Infrastructure
Susanna-Assunta Sansone, Philippe Rocca-Serra, Eamonn Maguire (U. Oxford); Chris
Taylor, Marco Brandizi, Gabriella Rustici, Nataliya Sklyar, Manon Delahaye, Richard Evan
(EBI) ; Kimberly Begley, Dorothy Reilly, Oliver Hofmann, Winston Hide (Harvard School of
Public Health); Hong Fang, Joshua Xu, Martin Jackson, Jie Zhang, Stephen Harris, Weida
Tong (FDA Center for Bioinformatics); Tim Booth, Bela Tiwari, Norman Morrison, Dawn
Field (NEBC); Steffen Neumann (Leibniz Institute of Plant Biochemistry); Peter Sterk, Jack
Gilbert, Folker Meyer, Linda Amaral-Zettler, Dawn Field (GSC); Alain Zasadzinski, MarieChristine Jacquemot, Florian Mazur, Damien Fleury, Yahia Berchi, Morad Mercheref,
Claude Niederlander, Magali Roux (CNRS Institute of Biological Sciences); Audrey
Kauffman (Bergonie Cancer Institute); Miroslaw Dylag (Mentor Software Ltd.).
Funding
NEBC, NERC, BBSRC.
The objections to fuller reporting
 Why should I dedicate resources to providing data to others?
— Pro bono arguments have no impact (altruism is a myth)
— Sticks wielded by funders and publishers get the bare minimum
— No traceability in most contexts (intellectual property = ?)
— Loss of competitive advantage (both direct and indirect)
 This is just a ‘make work’ scheme for bioinformaticians
— Bioinformaticians get a buzz out of having big databases
— Parasites benefitting from others’ work ( mutualism..?)
 I don’t trust anyone else’s data — I’d rather repeat work
— Problems of quality, which are justified to an extent
— But what of people lacking resources or specific expertise?
 How on earth am I supposed to do this anyway..?
— Perception that there is no money to pay for this
— No mature free tools — Excel sheets are no good for HT
— Worries about vendor support, legacy systems (business models)
Credit where credit’s due
 Data sharing is more or less a given now, and tools are emerging
— Lots of sticks, but they only get the bare minimum
— How to get the best out of data generators?
— Need standards- and user-friendly tools, and meaningful credit
 Central registries of data sets that can record deposit and reuse
— Well-presented, detailed papers get cited more frequently
— The same principle should apply to data sets (metadata, etc.)
— ORCIDs for people (orcid.org), DOIs for data (datacite.org)
 Side-benefits, challenges
— Would also clear up problems around paper authorship
— Would enable other kinds of credit (training, curation, etc.)
— Community policing — researchers ‘own’ their credit portfolio
(enforcement body useful, but most likely to be reviewers)
— Problem of ‘micro data sets’ and legacy data
Download