Event Report Report author: Albert Burger, Adrian Paschke, Paolo Romano, Andrea Splendiani Event organiser(s): Albert Burger, Adrian Paschke, Paolo Romano, Andrea Splendiani Title of event: Semantic Web Applications and Tools for Life Sciences (SWAT4LS) Date of event: 28 November 2008 Target Audience: Researchers, developers and users of Semantic Web Technologies from the fields of Biology, Bioinformatics and Computer Science Objectives: This workshop's objectives were • To present and discuss benefits and limits of the adoption of Semantic Web technologies and tools in biomedical informatics and computational biology. • To showcase experiences, information resources, tools development and applications. • To bring together researchers, both developers and users, from the various fields of Biology, Bioinformatics and Computer Science. • To discuss goals, current limits and some real use cases for Semantic Web technologies in Life Sciences. Chronology of Event: The event consisted of: • • • • • • 2 invited talks (30 min each) 1 tutorial (45 min) 8 paper presentations (20min each) 4 poster presentations (5 min each) 1 demo presentation (5 min) a poster session (45 min) A brief summary of each of the presentations (including invited talks and tutorial) is given below (listed in the chronological order as delivered during the event). Type of Contribution Title Speaker Summary Invited Talk Semantic web technology in translational cancer research Dr Michael Krauthammer Michael Krauthammer received his M.D. degree at the University of Zurich, Switzerland. After board certification (general practitioner), he obtained a Ph.D. in biomedical informatics at Columbia University in New York and joined the Yale Pathology Informatics program in July, 2004. His main research interests are the design of large scale text and image mining systems and research in translational informatics. He is the co-director of the bioinformatics core of the Yale SPORE in skin cancer, a large translational research program, and member of the Yale Cancer Center (YCC) informatics steering committee. He is the Yale PI for adopting caBIG's caTISSUE specimen tracking system across the Yale Medical Campus, enabling Yale researcher to manage their tissue banks and share data via the caGRID infrastructure. He is involved, on a national level, in enabling the collaboration among Page 1 of 8 existing skin SPORE programs using caBIG technology. The project, termed "melaGRID", that is carried out by using semantic web technologies, will allow for the sharing of clinical, tissue and omics data, and will be instrumental for performing cross-institutional biomarker studies in melanoma. Type of Contribution Title Speaker Summary Paper Presentation Semantic Data Integration for Francisella tularensis novicida Proteomic and Genomic Data Nadia Anwar Abstract: This paper summarises the lessons and experiences gained from a case study of the application of semantic web technologies to the integration of data from the bacterial species Francisella tularensis novicida (Fn). Fn data sources are disparate and heterogeneous, as multiple laboratories across the world, using multiple technologies, perform experiments to understand the mechanism of virulence. It is hard to integrate such data, and this work examines the role of explicitly provided data semantics in data integration. We test whether the semantic web technologies could be used to reveal previously unknown connections across the available Fn datasets. We combined this data with genome data and with public domain annotations within GO, KEGG and the SUPERFAMILY database. Through this connected graph of database cross references, we extended the annotations of an experimental data set by superimposing onto it the annotation graph. Identifiers used in the experimental data automatically resolved and the data acquired annotations in the rest of the RDF graph. This happened without the expensive manual annotation that would normally be required to produce these links. Other lessons learnt and future challenges that result from this work are also presented in detail. Type of Contribution Title Speaker Summary Paper Presentation Use of shared lexical resources for efficient ontological engineering Ernesto Jimenez-Ruiz Abstract: This paper is intended to approach one of the main problems in ontology engineering: the lack of a shared terminology. Nowadays there exists several biomedical ontologies describing overlapping domains, but there is not a clear correspondence between the concepts that are supposed to be equivalent or just similar. These resources are quite precious but their integration and further development are expensive. Terminological or lexical resources may support the ontological development in several stages of the lifecycle of the ontology including ontology integration and the labeling of concepts. In this paper we investigate the use of lexical resources during the ontology lifecycle using the example of the Health-e-Child (HeC) project. We claim that the proper creation and use of a shared lexicon is a cornerstone for the successful application of the Semantic Web technology within life sciences. Type of Contribution Title Speaker Summary Paper Presentation KASBi: Knowledge-Based Analysis in Systems Biology Ismael Navas-Delgado Abstract: The analysis of information in the biological domain is usually focused on the analysis of data from single on-line data sources. Unfortunately, studying a biological process requires having access to disperse, heterogeneous, autonomous data sources. In this context, an analysis of the information is not possible without the integration of such data. This paper describes how KOMF, the Khaos Ontology-based Mediator Framework, is used to retrieve information and crystallize it in a (persistent) Knowledgebase. This information could be further analyzed later (by means of querying and reasoning). These kinds of systems (based on KOMF) will Page 2 of 8 provide users with very large amounts of information (interpreted as ontology instances once retrieved), which cannot be managed using traditional main memory-based reasoners. We propose a methodology for creating persistent and scalable knowledgebases from sets of OWL instances. Type of Contribution Title Speaker Summary Paper Presentation PathExplorer: Service Mining for Biological Pathways on the Web George Zheng Abstract: We propose to model biological processes using Web services to address limitations of existing biological representation methodologies. We apply our Web service mining tool, named PathExplorer, to discover potentially interesting biological pathways linking service models of biological processes. The tool uses an innovative approach to identify useful pathways based on graph-based hints and service-based simulation verifying user’s hypotheses. Type of Contribution Title Speaker Summary Paper Presentation GoWeb: A semantic search engine for the life science web Heiko Dietze Abstract: Background: Current search engines are keyword-based. Semantic technologies promise a next generation of semantic search engines, which will be able to answer questions. Current approaches either apply natural language processing to unstructured text or they assume the existence of structured statements over which they can reason. Results: Here, we introduce a third approach, GoWeb, which combines classical keyword-based Web search with text-mining and ontologies to navigate large results sets and facilitate question answering. We evaluate GoWeb on three benchmarks of questions on genes and functions, on symptoms and diseases, and on proteins and diseases. The first benchmark is based on the BioCreAtivE 1 Task 2 and links 457 gene names with 1352 functions. GoWeb finds 58% of the functional GeneOntology annotations. The second benchmark is based on 26 case reports and links symptoms with diseases. GoWeb achieves 77% success rate improving an existing approach by nearly 20%. The third benchmark is based on 28 questions in the TREC genomics challenge and links proteins to diseases. GoWeb achieves a success rate of 79%. Conclusion: GoWeb’s combination of classical Web search with text-mining and ontologies is a first step towards answering questions in the biomedical domain. GoWeb is online at: www.gopubmed.org/goweb Type of Contribution Title Paper Presentation Close Integration of ML and NLP Tools in Bivalves for Semantic Search in Bacteriology Robert Bossy Abstract: This paper focuses on the use of corpus-based machine learning (ML) methods for fine-grained semantic annotation of text. The state of the art in semantic annotation in Life Science as in other technical and scientific domains, takes advantage of recent breakthroughs in the development of natural language processing (NLP) platforms. The resources required to run such platforms include named entity dictionaries, terminologies, grammars and ontologies. The demand for domain-specific, comprehensive and low cost resources led to the intensive use of ML methods. The precise specification of the ML task goal and target knowledge, and the adequate normalization of the training corpus representation can notably increase the quality of the acquired knowledge. We argue in this paper that integrated MLNLP architectures facilitate such specifications. We illustrate our demonstration with four representative NLP tasks that are part of the Speaker Summary Page 3 of 8 Bivalves semantic annotation platform. Their impact on the quality of the semantic annotation is qualified through the evaluation of an IR application in Bacteriology. Type of Contribution Title Speaker Summary Type of Contribution Title Speaker Summary Poster Presentation Supporting Process Development in Bio-jet by Model Checking and Synthesis Anna-Lena Lamprecht Abstract: Bio-jet is a platform for the intuitive graphical design and execution of bioinformatics workflows composed from heterogeneous remote services. In this paper we use a simple phylogenetic analysis process to show how formal approaches like model checking and process synthesis can be applied to further support the workbox development in Bio-jet. To unfold their full potential these methods need a comprehensive knowledge base about the domain, containing semantic information about the single services as well as ontological classifications of the used terms. We outline how to systematically integrate these semantic web concepts into our framework and discuss the implications on checking and synthesis. Poster Presentation Structuring mined knowledge for the support of hypothesis generation in molecular biology Marco Roos Abstract: Hypothesis generation in the life sciences is an empirical process in which obtaining and structuring knowledge from literature plays a significant role. Text mining and Information Extraction techniques are seen as key for programmatically accessing the knowledge captured in the form of free text. We describe progress towards an application that supports the task of generating a hypothesis about biomolecular mechanisms using Semantic web technologies and a workflow to carry out text mining in a serviceoriented architecture. The output is a semantic model with putative biological relationships that have been extracted from literature, with each relationship linked to the corresponding evidence. We present preliminary data that extends a model for chromatin (de)condensation. The methodology can be used to bootstrap the process of human-guided construction of semantically rich biological models using the results of knowledge extraction processes. Type of Contribution Title Speaker Summary Poster Presentation Knowledge management using Wikipedia Seongbin Park Abstract: In this paper, we present an ontology-based system that helps users manage knowledge using Wikipedia. The system analyzes ontologies and uses the structural information about the ontologies to re-structure contents of Wikipedia for better browsing. Using the system, users can acquire knowledge easily from Wikipedia. We show how the system can be used for life science applications. Type of Contribution Title Speaker Summary Poster Presentation Knowledge Translation: Computing the query potential of bio-ontologies Christopher Baker Abstract: Online ontologies have become a topic of discussion in a number of meta-data and content management communities including biology. For a number of technical and social reasons, domain experts are unable to get close enough to understand the conceptualization of an ontology they may wish to reuse or access as a query model. Consequentially they face initial Page 4 of 8 conceptual challenges in how they can contribute to the ontology development process and what actual benefit they can derive from their contributions in the short and medium term. To ameliorate this need we report on the KnowleFinder system that summarizes queries that can be built from ontology as a query model and translates these into natural language statements for interpretation by the domain expert. Each natural language statement can then be submitted as an A-box reasoning query and its subsequent answer is retrieved. We illustrate the system with a subset of bio-ontologies. Type of Contribution Title Speaker Summary Demo Presentation BioGateway: Query architecture and visualisation of results Erick Antezana Various Web-based User Interfaces to BioGateway were demonstrated. Type of Contribution Title Tutorial The W3C Interest Group on Semantic web technologies for Health Care and Life Sciences M. S. Marshall (W3 HCLS IG co-chair) The W3C Semantic Web for Health Care and Life Sciences Interest Group (HCLS IG) was recently re-chartered for the next three years to continue its mission to develop, advocate for, and support the use of Semantic Web technologies for biological science, translational medicine and health care. Membership in the group has grown to 89 participants, with a wide range of representation from industry and academia. The HCLS tutorial discussed the challenges and opportunities at hand. An overview of the activities of the each of the current task forces in HCLS was provided, along with a description of how specific Semantic Web technologies are being applied. Some new developments and the recent Face2Face meeting were also discussed, as well as how interested parties can participate. Speaker Summary Type of Contribution Title Speaker Summary Type of Contribution Title Speaker Summary Paper Presentation Structuring the life science resourceome for Semantic Systems Biology: lessons from the BioGateway project Erick Antezana Abstract: The application of Semantic Web technologies in the life sciences for data integration is still nascent. We have recently built Bio-Gateway, an RDF store that integrates all the candidate OBO Foundry ontologies with other resources such as SWISS-PROT. In the course of developing BioGateway, we faced challenges that are common to other projects that involve large datasets in diverse formats. We present a detailed analysis of the obstacles that had to be solved in creating Bio-Gateway. In doing so, we demonstrate the potential of a comprehensive application of Semantic Web technologies to global biomedical data. The time is ripe for launching a community effort aiming at a wider acceptance and application of Semantic Web technologies in the life sciences domain. We make a public call for the creation of a forum that strives to implement a truly semantic life science foundation of a type of Systems Biology that we named Semantic Systems Biology. Paper Presentation Knowledge Representation for Web Navigation Simon Juppe Abstract: Representations of domain knowledge range from those that are ontologically formal, semantically rich to those that are ontologically informal Page 5 of 8 and semantically weak. Representations of knowledge are important in many tasks, one of which is the support of travel around information spaces through the identification and linking of concepts in a field. In this paper we explore how representations of ontologically informal, semantically weak domain knowledge as captured by the Simple Knowledge Organisation System (SKOS) can enable a system to take advantage of the large number of existing ontological representations to support semantic linking of Web based information and thus facilitate information travel. Type of Contribution Title Speaker Summary Type of Contribution Title Panel Chair Summary Invited Talk Web 2.0 + Web 3.0 = Web 5.0? Using Ontologies to bring Web Services on to the Semantic Web Dr Mark Wilkinson Mark is an Assistant Professor of Medical Genetics at the University of British Columbia, in Vancouver. He is also PI in Bioinformatics at the Heart & Lung Research Institute at St. Paul's Hospital. His primary research interests relate to the construction and use of Semantic systems in the biomedical domain, and in particular the role of mass-collaboration in the development and maintenance of Semantic Web technologies and frameworks. He is founder and leader of the BioMoby project and founder and leader of the SHARE project. He discussed the BioMoby project and how it opened his eyes to what the Semantic Web could look like, and what mistakes were made along the way. He then went on to discuss plans for the next generation of Moby Semantic Web Services, where he attempts to make Web Service access completely transparent, such that the "Deep Web" can be queried just like any other Semantic Web resource. Panel Discussion If the Semantic Web is so good, how come most people use OBO for ontologies and perl for data integration? Dr Phil Lord Panel members: Michael Krauthammer, M. Scott Marshall, Dave De Roué, Mark Wilkinson The panel discussed various pros and cons of Semantic Web technology in the Life Sciences. The question was raised whether after some 10 years of SW research there have been demonstrable application successes in the field. The panel was of the view that although there is still some way to go, much progress has been made in developing applications that are useful to Life Science researchers. Asked whether RDF stores are now sufficiently fast to effectively work with large RDF datasets (an issue raised during an earlier presentation), the panel answered that there are indeed very good and scalable RDF storage solutions available. With respect to what aspects of SW technology should be funded in the future by research councils, the issue of Life Science Identifiers has been specifically mentioned. Page 6 of 8 Event Achievements: The overall objectives of the workshop have been fully satisfied and in many respects the outcomes of the event exceeded the original expectations of the the organisers. The Call for Papers for this event resulted in 40 submissions: • full papers: 24 submitted, 8 accepted for publication; • poster papers: 15 submitted, 4 accepted for publication; • demo papers: 1 submitted, 1 accepted for publication; The Call for Participation resulted in 76 attendees, a considerable success, particularly considering that the event has been organised for the first time, thus has no previous track record, and was a stand-alone one-day workshop not co-located with any other larger bioinformatics conference. This also compares favourably with other similar events in this domain. Participation extended considerably beyond researchers associated with the work presented during the workshop. No doubt, the appeal of high quality international invited keynote speakers has contributed to this. Although we do not have exact statistics as to the distribution of participants' backgrounds, from the discussion that took place on the day, it was evident that the event was attended not only by Informatics experts, but also attracted scientists from the Life Sciences, which indeed was one of the key objectives of the workshop. Many participants were actively involved in practical discussions on implementation details, including best tools, that demonstrated to give most effective outcomes, special limits of existing tools and first interesting biological results. The panel discussion was a good opportunity to discuss current limits and perspectives of the adoption of Semantic Web technologies in Life Sciences, showing that we are quickly moving from a pioneering era, where the real applicability of these technologies were investigated, to a more effective and productive time, where the first tools are being implemented, starting to demonstrate the actual benefits of this approach. A feedback questionnaire has been returned by 13.2% of the attendees. The feedback confirms the great success of the event. Specifically, all replies point out that the objectives of the attendees, which include the gaining of new knowledge in the area, learning about new e-Research techniques, learning about other areas of research, dissemination of information of own work, participation in an existing collaboration, exploring opportunities for new collaborations and network, have been achieved. Also all replies state that the event's contribution to their work was: 'more than expected' - 1 reply, 'achieved expectation' - 3 replies, or was 'positive' - 5 replies. None of the replies described the contribution of the workshop as either 'not as much as I had hoped' or 'disappointing'. There were no specific future new research proposals or collaborations discussed as part of the official programme, but it is evident from the questionnaire feedback that discussions on new collaborations and networking has taken place during the event. Due to the success of this event, a similar event, SWAT4LS 2009, is planned for next year. Furthermore, a call for a special issue of BMC Bioinformatics based on SWAT4LS is currently being prepared. Any Other Observations: The proceedings of the SWAT4LS workshop are publicly accessibly at: http://www.swat4ls.org/proceedings/ (by using account: guest, password: guest2008). Presentations given during the workshop are available from: http://www.nesc.ac.uk/action/esi/contribution.cfm?Title=922 Page 7 of 8 The web site for this workshop can be found here: http://www.swat4ls.org/ . Access to this web page from 1 June 2008 - 3 December 2008: Page 8 of 8