The BioImage Database E-BioSci / ORIEL

advertisement
E-BioSci / ORIEL
Open Meeting 28/4/03
National e-Science
Centre, Edinburgh
The BioImage Database
David M. Shotton
Image BioInformatics Laboratory
Department of Zoology
University of Oxford, UK
e-mail: david.shotton@zoo.ox.ac.uk
© David Shotton 2003
The importance of image bioinformatics
Electron micrographs provide
ultrastructural and molecular detail
Light micrographs are vital for localization
of gene expression and functional analysis
Videos can show dynamic processes
as they occur
In these cytotoxic T lymphocytes,
the motile protein actin is stained
red, while microtubules are
stained green
One such cytotoxic T lymphocyte is
seen here killing a virally infected
fibroblast cell
[VIDEO OF CTL KILLING TARGET]
The value of image annotation
z
z
z
z
Images are semantic instruments for capturing aspects of the real
world, and form a vital part of the scientific record
However, the value of digital image information depends upon how
easily it can be located, searched for relevance, and retrieved
Detailed descriptive metadata about the such images are essential
Without them, digital image repositories become little more than
meaningless and costly data graveyards
The need for a public scientific image
database for biological research images
z
z
z
z
z
Despite the growth of on-line journals that permit the inclusion of
media objects, few of these resources are publicly available without
subscription
Furthermore, such proprietary image stores are not cross-searchable
Most digital images sit on the hard discs of researchers, unavailable
for further research
There is thus a pressing need for a free publicly available image
database, with rich well-structured searchable metadata, that can act
as an academic repository for multi-dimensional biological research
images
The BioImage Database seeks to fulfil that need
The BioImage Database
(www.bioimage.org)
BioImage development within ORIEL
The aims of the BioImage Database Project within ORIEL are:
z
To provide a searchable database of high-quality scientific images
of biological specimens, with detailed supporting metadata on
• the biological specimen
• the experimental procedure
• details of image formation and subsequent digital processing
• the people involved
• the metadata curation and provenance
z
To integrate such multi-dimensional digital image data with other
life science resources in the ‘factual’ and literature databases,
accessible from the E-BioSci platform
The organisation of information
within the BioImage Database
z
z
z
z
z
The basic unit of organisation within the BioImage Database is
the STUDY, roughly equivalent to a scientific publication.
A STUDY may contain images derived from several experiments
A BioImage STUDY thus contains one or more IMAGE SETS,
each corresponding to a particular scientific experiment or
investigation on a particular topic
Each IMAGE SET contains one or more IMAGES on a common
theme
Such an IMAGE may be 2D, a 3D image stack or a video
Browsing and searching within BioImage
z
Users may browse or search the BioImage Database
• by Study
•
•
z
by Image Set
or by Image
For each representation, a thumbnail representative image and
basic information are initially presented
z
More detailed information is a mouse click away
z
Browses and searches may be progressively refined
The
BioImage
home
page
bioimage.org
Note the
alternative
browse search
arrangement
The
BioImage
advanced
search
interface
The BioImage search results interface
Design of the BioImage user interfaces
z
Our ORIEL partner Ingenta plc gave us great assistance in
designing these new user interfaces by undertaking user
interviews and by applying information architecture expertise
and graphic design skills
• special thanks to Margaret Hanley, Michael Sullivan, Vicky
Buser and Matt deMeis
z
Their work resulted in Web page templates and style sheets that
we have integrated with the underlying database organization,
to permit data within the BioImage Database to be displayed in
an attractive and easily searchable / browsable manner
What’s under the hood?
BioImage design criteria
z
Our design criteria have included:
• Design of Web user interfaces of high quality and usability
• Conformity to international multimedia and Web standards
• Design of the underlying database structure by incorporation of
the latest Semantic Web technology
• Adoption of a modular data model to permit inclusion of new
biological fields, by the design of a BioImage Database
ontology that can interact with other domain-specific ontologies
• Use of third party Open Source components wherever possible
• Creation of new software tools where necessary
External Open Source software components
Component
Software
Reasons for choice
Relational DBMS
PostgreSQL
Proven scalability. Support for inheritance
Web server
Tomcat
Proven scalability. Sufficient to the task
Application
framework
Struts
Provides support for internationalization. Stxx
extensions allow us to mix JSP and XSLT
Presentation
framework
Sitemesh
Widely adopted and Struts compatible
Ontology standards
DAML+OIL
(OWL)
Provides the minimum required functionality.
Adopted as an international standard
Ontology
development
Jena
The leading Open Source Java-based
DAML+OIL compatible parser
External ontologies
GO, NCBI
Provide domain-specific knowledge structures
The BioImage Ontology
z
We have developed a specific scientific image ontology, the BioImage
Ontology
z
This prototype BioImage Ontology is written in DAML+OIL
z
It is based on
• the SUMO Upper Level Ontology, and it used relevant parts of
• AAF (the Advanced Authoring Format for multimedia) and of
• MPEG-7 (the Multimedia Content Description Interface standard)
z
The BioImage Ontology permits us to differentiate
• ‘real world’ objects and events from
• ‘narrative world’ parameters of image description and interpretation
• media parameters detailing image recording
Ontology Organiser
z
z
z
z
z
To assist in designing the BioImage Ontology, because no software
of this type was available, we have written Ontology Organizer
Ontology Organizer is a Java Jena-based DAML+OIL ontology
constraint propagator and datatype manager
It permits correct constraint propagation up complex hierarchies, and
additionally enables the use of datatypes in a manner not possible
with ontology editors such as OilEd
We have recently released Ontology Organizer as Open Source code
on SourceForge (http://sourceforge.net/projects/damlconstraint/)
Ontology Organizer is a generic tool, useful for any DAML+OIL
ontology development, and will shortly be updated to the W3C’s OWL
Web Ontology Language
Ontology
organiser
Here we see a
portion of the
Resource
hierarchy in
which Human is
linked to Title,
with datatype
and cardinality
constraints
OwlBase
z
z
z
z
z
The third new software component that we are currently in the process
of developing for the BioImage Database is OwlBase, a Java ontology
mapping servlet that permits us to combine traditional relational
database tables with Semantic Web RDF triples
Image metadata that fit the traditional relational model (for example
Author names, Species, Study Titles, etc), and that are likely to be
objects of frequent searches, will be stored in traditional relational
tables
Data that do not easily fit this model – either because they are
hierarchically structured or because they have unpredictable structures
- will be stored as a set of Entity-Attribute-Value RDF triples
What gets stored where is resolved by flags in the BioImage Ontology,
and is implemented by the ontology mapping servlet OwlBase
Like Ontology Organiser, OwlBase is a generic tool, and we hope to
release it as Open Source code as soon as it is stable
The new BioImage Database structure
Relational tables
and
RDF triple store
Unfinished business in system development
z
z
z
z
z
The SOAP interfaces to the external ontologies have yet to be
developed. At present, necessary parts of the external ontologies
are imported to the BioImage server in Oxford and used locally
The RQL query interface from the Query Servlet to the BioImage
Ontology, and the XML query results interface from the BioImage
Ontology back to the Presentation Servlet, have yet to be
developed. (RDF query languages are in their infancy, standards
are not yet ratified, and many of the tools we need have yet to be
built). At present, all queries and returns are restricted to SQL
The management of the Media Store system for the image files
themselves needs some improvement
Other than that, the system is working, and our first release of this
new BioImage Database will be made within the next few weeks
We are anxious to obtain feedback on usability from test users
The submission interfaces
The BioImage Database – metadata submission
z
z
One of the problems with any database requiring complex metadata
is the effort involved in submitting the metadata
We plan to ease this problem for the BioImage Database
• by simplifying the obligatory metadata requirements as far as
possible
• by personalizing the functionality of the submission interfaces,
such that one author has to submit common information (e.g.
her address) only once
• by developing methods of harvesting metadata automatically
from marked up journal articles and digital image header files
z
We are at present developing our submission interfaces, which
should be fully functional in prototype form by the summer
Our initial
submission
interface
Submission interface development
z
z
z
z
z
Each submitting author has a username and password
On log-in, the user is presented with a list of previously submitted
studies to edit, or the possibility of submitting a new study
Partial submissions can thus be resumed in at a later time
Predefined parameters will be presented as controlled vocabulary
choices from the BioImage Ontology and the appropriate external
ontologies in the form of drop-down lists to ease the submission
process
The Struts system in the Submission Servlet will provide datatype
and field validation, and will present the submitter with appropriate
error messages in case of problems
Population of the BioImage Database
Populating the new BioImage Database
z
z
z
We are building contacts with journals, scientific publishers,
learned societies and other organizations that hold valuable
collections of biological images, with a view to hosting or linking
to their content from within the BioImage database
It should be remembered that BioImage is primarily a metadata
database, and the images themselves need not reside on the
BioImage server (although thumbnail images and the full
metadata must)
Instead, the high resolution originals can reside in distributed
archives, e.g. on publishers’ or societies’ Web sites under their
own access control, or under the control of individual authors
Journals
z
We have negotiated the regular population of the BioImage
Database with images from two major scientific publications,
namely The EMBO Journal and The Journal of Microscopy
z
We will host both figures from the papers themselves, and also,
with the authors’ permission, those additional images from each
study for which there is no room in the journal, either because of
their dimensionality or because of space constraints
z
We are presently seeking to widen this pilot scheme to include
other journals, and welcome suggestions
z
Later in the ORIEL Project, we will evaluate the usefulness of this
service to authors, to editors and referees, and to readers
Publicity for the BioImage Database
BioImage publicity
z
In July, Vivien Marx, a freelance science journalist from MIT, wrote a
two-page feature in Science Magazine entitled “Beautiful Bioimages”
BioImage publicity
Nature has publish our response to a recent letter
about images and scientific illustrations, in which we
state how BioImage has the necessary metadata
structures to permit accurate image description
Integration of the BioImage Database
with the E-BioSci Platform
BioImage and E-BioSci
z
The BioImage Database structure is generic, and could in principle
be re-used for other scientific areas – AgroImage, AstroImage, etc.
– simply by the incorporation of other domain-specific ontologies
z
It is our intention that the BioImage Database will become a
preferred repository for all multidimensional biological images
associated with the E-BioSci and ORIEL projects
z
We hope that E-BioSci will foster a culture of submission of images
to the BioImage Database that resembles the present culture of
submission of sequence data and crystallographic information to
the public databases designed for those data types
z
We are now ready to build the necessary reciprocal links between
the BioImage Database and the E-BioSci platform
z
True to our character as part of a European project, we can easily
provide support for multiple languages using Struts
An example of multilingual capabilities
BioImage Database staff at Oxford University
Steffen Lindek,
BioImage Development Manager, 1/1/02 - 30/4/02
(a key member of the prototype BioImage team)
Chris Catton, BioImage Development
Manager, from 9/4/02 (experience in
databases and biological film production)
Simon Sparks, BioImage Java
Programmer, from 9/4/02
(recent computer science
graduate, with interests in
the Semantic Web)
End
Download