E-BioSci / ORIEL Open Meeting 28/4/03 National e-Science Centre, Edinburgh The BioImage Database David M. Shotton Image BioInformatics Laboratory Department of Zoology University of Oxford, UK e-mail: david.shotton@zoo.ox.ac.uk © David Shotton 2003 The importance of image bioinformatics Electron micrographs provide ultrastructural and molecular detail Light micrographs are vital for localization of gene expression and functional analysis Videos can show dynamic processes as they occur In these cytotoxic T lymphocytes, the motile protein actin is stained red, while microtubules are stained green One such cytotoxic T lymphocyte is seen here killing a virally infected fibroblast cell [VIDEO OF CTL KILLING TARGET] The value of image annotation z z z z Images are semantic instruments for capturing aspects of the real world, and form a vital part of the scientific record However, the value of digital image information depends upon how easily it can be located, searched for relevance, and retrieved Detailed descriptive metadata about the such images are essential Without them, digital image repositories become little more than meaningless and costly data graveyards The need for a public scientific image database for biological research images z z z z z Despite the growth of on-line journals that permit the inclusion of media objects, few of these resources are publicly available without subscription Furthermore, such proprietary image stores are not cross-searchable Most digital images sit on the hard discs of researchers, unavailable for further research There is thus a pressing need for a free publicly available image database, with rich well-structured searchable metadata, that can act as an academic repository for multi-dimensional biological research images The BioImage Database seeks to fulfil that need The BioImage Database (www.bioimage.org) BioImage development within ORIEL The aims of the BioImage Database Project within ORIEL are: z To provide a searchable database of high-quality scientific images of biological specimens, with detailed supporting metadata on • the biological specimen • the experimental procedure • details of image formation and subsequent digital processing • the people involved • the metadata curation and provenance z To integrate such multi-dimensional digital image data with other life science resources in the ‘factual’ and literature databases, accessible from the E-BioSci platform The organisation of information within the BioImage Database z z z z z The basic unit of organisation within the BioImage Database is the STUDY, roughly equivalent to a scientific publication. A STUDY may contain images derived from several experiments A BioImage STUDY thus contains one or more IMAGE SETS, each corresponding to a particular scientific experiment or investigation on a particular topic Each IMAGE SET contains one or more IMAGES on a common theme Such an IMAGE may be 2D, a 3D image stack or a video Browsing and searching within BioImage z Users may browse or search the BioImage Database • by Study • • z by Image Set or by Image For each representation, a thumbnail representative image and basic information are initially presented z More detailed information is a mouse click away z Browses and searches may be progressively refined The BioImage home page bioimage.org Note the alternative browse search arrangement The BioImage advanced search interface The BioImage search results interface Design of the BioImage user interfaces z Our ORIEL partner Ingenta plc gave us great assistance in designing these new user interfaces by undertaking user interviews and by applying information architecture expertise and graphic design skills • special thanks to Margaret Hanley, Michael Sullivan, Vicky Buser and Matt deMeis z Their work resulted in Web page templates and style sheets that we have integrated with the underlying database organization, to permit data within the BioImage Database to be displayed in an attractive and easily searchable / browsable manner What’s under the hood? BioImage design criteria z Our design criteria have included: • Design of Web user interfaces of high quality and usability • Conformity to international multimedia and Web standards • Design of the underlying database structure by incorporation of the latest Semantic Web technology • Adoption of a modular data model to permit inclusion of new biological fields, by the design of a BioImage Database ontology that can interact with other domain-specific ontologies • Use of third party Open Source components wherever possible • Creation of new software tools where necessary External Open Source software components Component Software Reasons for choice Relational DBMS PostgreSQL Proven scalability. Support for inheritance Web server Tomcat Proven scalability. Sufficient to the task Application framework Struts Provides support for internationalization. Stxx extensions allow us to mix JSP and XSLT Presentation framework Sitemesh Widely adopted and Struts compatible Ontology standards DAML+OIL (OWL) Provides the minimum required functionality. Adopted as an international standard Ontology development Jena The leading Open Source Java-based DAML+OIL compatible parser External ontologies GO, NCBI Provide domain-specific knowledge structures The BioImage Ontology z We have developed a specific scientific image ontology, the BioImage Ontology z This prototype BioImage Ontology is written in DAML+OIL z It is based on • the SUMO Upper Level Ontology, and it used relevant parts of • AAF (the Advanced Authoring Format for multimedia) and of • MPEG-7 (the Multimedia Content Description Interface standard) z The BioImage Ontology permits us to differentiate • ‘real world’ objects and events from • ‘narrative world’ parameters of image description and interpretation • media parameters detailing image recording Ontology Organiser z z z z z To assist in designing the BioImage Ontology, because no software of this type was available, we have written Ontology Organizer Ontology Organizer is a Java Jena-based DAML+OIL ontology constraint propagator and datatype manager It permits correct constraint propagation up complex hierarchies, and additionally enables the use of datatypes in a manner not possible with ontology editors such as OilEd We have recently released Ontology Organizer as Open Source code on SourceForge (http://sourceforge.net/projects/damlconstraint/) Ontology Organizer is a generic tool, useful for any DAML+OIL ontology development, and will shortly be updated to the W3C’s OWL Web Ontology Language Ontology organiser Here we see a portion of the Resource hierarchy in which Human is linked to Title, with datatype and cardinality constraints OwlBase z z z z z The third new software component that we are currently in the process of developing for the BioImage Database is OwlBase, a Java ontology mapping servlet that permits us to combine traditional relational database tables with Semantic Web RDF triples Image metadata that fit the traditional relational model (for example Author names, Species, Study Titles, etc), and that are likely to be objects of frequent searches, will be stored in traditional relational tables Data that do not easily fit this model – either because they are hierarchically structured or because they have unpredictable structures - will be stored as a set of Entity-Attribute-Value RDF triples What gets stored where is resolved by flags in the BioImage Ontology, and is implemented by the ontology mapping servlet OwlBase Like Ontology Organiser, OwlBase is a generic tool, and we hope to release it as Open Source code as soon as it is stable The new BioImage Database structure Relational tables and RDF triple store Unfinished business in system development z z z z z The SOAP interfaces to the external ontologies have yet to be developed. At present, necessary parts of the external ontologies are imported to the BioImage server in Oxford and used locally The RQL query interface from the Query Servlet to the BioImage Ontology, and the XML query results interface from the BioImage Ontology back to the Presentation Servlet, have yet to be developed. (RDF query languages are in their infancy, standards are not yet ratified, and many of the tools we need have yet to be built). At present, all queries and returns are restricted to SQL The management of the Media Store system for the image files themselves needs some improvement Other than that, the system is working, and our first release of this new BioImage Database will be made within the next few weeks We are anxious to obtain feedback on usability from test users The submission interfaces The BioImage Database – metadata submission z z One of the problems with any database requiring complex metadata is the effort involved in submitting the metadata We plan to ease this problem for the BioImage Database • by simplifying the obligatory metadata requirements as far as possible • by personalizing the functionality of the submission interfaces, such that one author has to submit common information (e.g. her address) only once • by developing methods of harvesting metadata automatically from marked up journal articles and digital image header files z We are at present developing our submission interfaces, which should be fully functional in prototype form by the summer Our initial submission interface Submission interface development z z z z z Each submitting author has a username and password On log-in, the user is presented with a list of previously submitted studies to edit, or the possibility of submitting a new study Partial submissions can thus be resumed in at a later time Predefined parameters will be presented as controlled vocabulary choices from the BioImage Ontology and the appropriate external ontologies in the form of drop-down lists to ease the submission process The Struts system in the Submission Servlet will provide datatype and field validation, and will present the submitter with appropriate error messages in case of problems Population of the BioImage Database Populating the new BioImage Database z z z We are building contacts with journals, scientific publishers, learned societies and other organizations that hold valuable collections of biological images, with a view to hosting or linking to their content from within the BioImage database It should be remembered that BioImage is primarily a metadata database, and the images themselves need not reside on the BioImage server (although thumbnail images and the full metadata must) Instead, the high resolution originals can reside in distributed archives, e.g. on publishers’ or societies’ Web sites under their own access control, or under the control of individual authors Journals z We have negotiated the regular population of the BioImage Database with images from two major scientific publications, namely The EMBO Journal and The Journal of Microscopy z We will host both figures from the papers themselves, and also, with the authors’ permission, those additional images from each study for which there is no room in the journal, either because of their dimensionality or because of space constraints z We are presently seeking to widen this pilot scheme to include other journals, and welcome suggestions z Later in the ORIEL Project, we will evaluate the usefulness of this service to authors, to editors and referees, and to readers Publicity for the BioImage Database BioImage publicity z In July, Vivien Marx, a freelance science journalist from MIT, wrote a two-page feature in Science Magazine entitled “Beautiful Bioimages” BioImage publicity Nature has publish our response to a recent letter about images and scientific illustrations, in which we state how BioImage has the necessary metadata structures to permit accurate image description Integration of the BioImage Database with the E-BioSci Platform BioImage and E-BioSci z The BioImage Database structure is generic, and could in principle be re-used for other scientific areas – AgroImage, AstroImage, etc. – simply by the incorporation of other domain-specific ontologies z It is our intention that the BioImage Database will become a preferred repository for all multidimensional biological images associated with the E-BioSci and ORIEL projects z We hope that E-BioSci will foster a culture of submission of images to the BioImage Database that resembles the present culture of submission of sequence data and crystallographic information to the public databases designed for those data types z We are now ready to build the necessary reciprocal links between the BioImage Database and the E-BioSci platform z True to our character as part of a European project, we can easily provide support for multiple languages using Struts An example of multilingual capabilities BioImage Database staff at Oxford University Steffen Lindek, BioImage Development Manager, 1/1/02 - 30/4/02 (a key member of the prototype BioImage team) Chris Catton, BioImage Development Manager, from 9/4/02 (experience in databases and biological film production) Simon Sparks, BioImage Java Programmer, from 9/4/02 (recent computer science graduate, with interests in the Semantic Web) End