Event Report

advertisement
Event Report
Report author:
Albert Burger, Adrian Paschke, Paolo Romano, Andrea Splendiani
Event organiser(s):
Albert Burger, Adrian Paschke, Paolo Romano, Andrea Splendiani
Title of event:
Semantic Web Applications and Tools for Life Sciences (SWAT4LS)
Date of event:
28 November 2008
Target Audience:
Researchers, developers and users of Semantic Web Technologies from
the fields of Biology, Bioinformatics and Computer Science
Objectives:
This workshop's objectives were
• To present and discuss benefits and limits of the adoption of Semantic Web technologies and
tools in biomedical informatics and computational biology.
• To showcase experiences, information resources, tools development and applications.
• To bring together researchers, both developers and users, from the various fields of Biology,
Bioinformatics and Computer Science.
• To discuss goals, current limits and some real use cases for Semantic Web technologies in
Life Sciences.
Chronology of Event:
The event consisted of:
•
•
•
•
•
•
2 invited talks (30 min each)
1 tutorial (45 min)
8 paper presentations (20min each)
4 poster presentations (5 min each)
1 demo presentation (5 min)
a poster session (45 min)
A brief summary of each of the presentations (including invited talks and tutorial) is given below (listed
in the chronological order as delivered during the event).
Type of Contribution
Title
Speaker
Summary
Invited Talk
Semantic web technology in translational cancer research
Dr Michael Krauthammer
Michael Krauthammer received his M.D. degree at the University of Zurich,
Switzerland. After board certification (general practitioner), he obtained a
Ph.D. in biomedical informatics at Columbia University in New York and
joined the Yale Pathology Informatics program in July, 2004. His main
research interests are the design of large scale text and image mining
systems and research in translational informatics.
He is the co-director of the bioinformatics core of the Yale SPORE in skin
cancer, a large translational research program, and member of the Yale
Cancer Center (YCC) informatics steering committee. He is the Yale PI for
adopting caBIG's caTISSUE specimen tracking system across the Yale
Medical Campus, enabling Yale researcher to manage their tissue banks and
share data via the caGRID infrastructure.
He is involved, on a national level, in enabling the collaboration among
Page 1 of 8
existing skin SPORE programs using caBIG technology. The project, termed
"melaGRID", that is carried out by using semantic web technologies, will
allow for the sharing of clinical, tissue and omics data, and will be
instrumental for performing cross-institutional biomarker studies in
melanoma.
Type of Contribution
Title
Speaker
Summary
Paper Presentation
Semantic Data Integration for Francisella tularensis novicida Proteomic and
Genomic Data
Nadia Anwar
Abstract: This paper summarises the lessons and experiences gained from a
case study of the application of semantic web technologies to the integration
of data from the bacterial species Francisella tularensis novicida (Fn).
Fn data sources are disparate and heterogeneous, as multiple laboratories
across the world, using multiple technologies, perform experiments to
understand the mechanism of virulence. It is hard to integrate such data, and
this work examines the role of explicitly provided data semantics in data
integration. We test whether the semantic web technologies could be used to
reveal previously unknown connections across the available Fn datasets. We
combined this data with genome data and with public domain annotations
within GO, KEGG and the SUPERFAMILY database. Through this
connected graph of database cross references, we extended the annotations
of an experimental data set by superimposing onto it the annotation graph.
Identifiers used in the experimental data automatically resolved and the data
acquired annotations in the rest of the RDF graph. This happened without
the expensive manual annotation that would normally be required to produce
these links. Other lessons learnt and future challenges that result from this
work are also presented in detail.
Type of Contribution
Title
Speaker
Summary
Paper Presentation
Use of shared lexical resources for efficient ontological engineering
Ernesto Jimenez-Ruiz
Abstract: This paper is intended to approach one of the main problems in
ontology engineering: the lack of a shared terminology. Nowadays there
exists several biomedical ontologies describing overlapping domains, but
there is not a clear correspondence between the concepts that are supposed
to be equivalent or just similar. These resources are quite precious but their
integration and further development are expensive. Terminological or lexical
resources may support the ontological development in several stages of the
lifecycle of the ontology including ontology integration and the labeling of
concepts. In this paper we investigate the use of lexical resources during the
ontology lifecycle using the example of the Health-e-Child (HeC) project. We
claim that the proper creation and use of a shared lexicon is a cornerstone
for the successful application of the Semantic Web technology within life
sciences.
Type of Contribution
Title
Speaker
Summary
Paper Presentation
KASBi: Knowledge-Based Analysis in Systems Biology
Ismael Navas-Delgado
Abstract: The analysis of information in the biological domain is usually
focused on the analysis of data from single on-line data sources.
Unfortunately, studying a biological process requires having access to
disperse, heterogeneous, autonomous data sources. In this context, an
analysis of the information is not possible without the integration of such
data. This paper describes how KOMF, the Khaos Ontology-based Mediator
Framework, is used to retrieve information and crystallize it in a (persistent)
Knowledgebase. This information could be further analyzed later (by means
of querying and reasoning). These kinds of systems (based on KOMF) will
Page 2 of 8
provide users with very large amounts of information (interpreted as ontology
instances once retrieved), which cannot be managed using traditional main
memory-based reasoners. We propose a methodology for creating persistent
and scalable knowledgebases from sets of OWL instances.
Type of Contribution
Title
Speaker
Summary
Paper Presentation
PathExplorer: Service Mining for Biological Pathways on the Web
George Zheng
Abstract: We propose to model biological processes using Web services to
address limitations of existing biological representation methodologies. We
apply our Web service mining tool, named PathExplorer, to discover
potentially interesting biological pathways linking service models of biological
processes. The tool uses an innovative approach to identify useful pathways
based on graph-based hints and service-based simulation verifying user’s
hypotheses.
Type of Contribution
Title
Speaker
Summary
Paper Presentation
GoWeb: A semantic search engine for the life science web
Heiko Dietze
Abstract: Background: Current search engines are keyword-based. Semantic
technologies promise a next generation of semantic search engines, which
will be able to answer questions. Current approaches either apply natural
language processing to unstructured text or they assume the existence of
structured statements over which they can reason.
Results: Here, we introduce a third approach, GoWeb, which combines
classical keyword-based Web search with text-mining and ontologies to
navigate large results sets and facilitate question answering. We evaluate
GoWeb on three benchmarks of questions on genes and functions, on
symptoms and diseases, and on proteins and diseases. The first benchmark
is based on the BioCreAtivE 1 Task 2 and links 457 gene names with 1352
functions. GoWeb finds 58% of the functional GeneOntology annotations.
The second benchmark is based on 26 case reports and links symptoms with
diseases. GoWeb achieves 77% success rate improving an existing
approach by nearly 20%. The third benchmark is based on 28 questions in
the TREC genomics challenge and links proteins to diseases. GoWeb
achieves a success rate of 79%.
Conclusion: GoWeb’s combination of classical Web search with text-mining
and ontologies is a first step towards answering questions in the biomedical
domain. GoWeb is online at: www.gopubmed.org/goweb
Type of Contribution
Title
Paper Presentation
Close Integration of ML and NLP Tools in Bivalves for Semantic Search in
Bacteriology
Robert Bossy
Abstract: This paper focuses on the use of corpus-based machine learning
(ML) methods for fine-grained semantic annotation of text. The state of the
art in semantic annotation in Life Science as in other technical and scientific
domains, takes advantage of recent breakthroughs in the development of
natural language processing (NLP) platforms. The resources required to run
such platforms include named entity dictionaries, terminologies, grammars
and ontologies. The demand for domain-specific, comprehensive and low
cost resources led to the intensive use of ML methods. The precise
specification of the ML task goal and target knowledge, and the adequate
normalization of the training corpus representation can notably increase the
quality of the acquired knowledge. We argue in this paper that integrated MLNLP architectures facilitate such specifications. We illustrate our
demonstration with four representative NLP tasks that are part of the
Speaker
Summary
Page 3 of 8
Bivalves semantic annotation platform. Their impact on the quality of the
semantic annotation is qualified through the evaluation of an IR application in
Bacteriology.
Type of Contribution
Title
Speaker
Summary
Type of Contribution
Title
Speaker
Summary
Poster Presentation
Supporting Process Development in Bio-jet by Model Checking and
Synthesis
Anna-Lena Lamprecht
Abstract: Bio-jet is a platform for the intuitive graphical design and execution
of bioinformatics workflows composed from heterogeneous remote services.
In this paper we use a simple phylogenetic analysis process to show how
formal approaches like model checking and process synthesis can be
applied to further support the workbox development in Bio-jet. To unfold their
full potential these methods need a comprehensive knowledge base about
the domain, containing semantic information about the single services as
well as ontological classifications of the used terms. We outline how to
systematically integrate these semantic web concepts into our framework
and discuss the implications on checking and synthesis.
Poster Presentation
Structuring mined knowledge for the support of hypothesis generation in
molecular biology
Marco Roos
Abstract: Hypothesis generation in the life sciences is an empirical process in
which obtaining and structuring knowledge from literature plays a significant
role. Text mining and Information Extraction techniques are seen as key for
programmatically accessing the knowledge captured in the form of free text.
We describe progress towards an application that supports the task of
generating a hypothesis about biomolecular mechanisms using Semantic
web technologies and a workflow to carry out text mining in a serviceoriented architecture. The output is a semantic model with putative biological
relationships that have been extracted from literature, with each relationship
linked to the corresponding evidence. We present preliminary data that
extends a model for chromatin (de)condensation. The methodology can be
used to bootstrap the process of human-guided construction of semantically
rich biological models using the results of knowledge extraction processes.
Type of Contribution
Title
Speaker
Summary
Poster Presentation
Knowledge management using Wikipedia
Seongbin Park
Abstract: In this paper, we present an ontology-based system that helps
users manage knowledge using Wikipedia. The system analyzes ontologies
and uses the structural information about the ontologies to re-structure
contents of Wikipedia for better browsing. Using the system, users can
acquire knowledge easily from Wikipedia. We show how the system can be
used for life science applications.
Type of Contribution
Title
Speaker
Summary
Poster Presentation
Knowledge Translation: Computing the query potential of bio-ontologies
Christopher Baker
Abstract: Online ontologies have become a topic of discussion in a number
of meta-data and content management communities including biology. For a
number of technical and social reasons, domain experts are unable to get
close enough to understand the conceptualization of an ontology they may
wish to reuse or access as a query model. Consequentially they face initial
Page 4 of 8
conceptual challenges in how they can contribute to the ontology
development process and what actual benefit they can derive from their
contributions in the short and medium term. To ameliorate this need we
report on the KnowleFinder system that summarizes queries that can be built
from ontology as a query model and translates these into natural language
statements for interpretation by the domain expert. Each natural language
statement can then be submitted as an A-box reasoning query and its
subsequent answer is retrieved. We illustrate the system with a subset of
bio-ontologies.
Type of Contribution
Title
Speaker
Summary
Demo Presentation
BioGateway: Query architecture and visualisation of results
Erick Antezana
Various Web-based User Interfaces to BioGateway were demonstrated.
Type of Contribution
Title
Tutorial
The W3C Interest Group on Semantic web technologies for Health Care and
Life Sciences
M. S. Marshall (W3 HCLS IG co-chair)
The W3C Semantic Web for Health Care and Life Sciences Interest Group
(HCLS IG) was recently re-chartered for the next three years to continue its
mission to develop, advocate for, and support the use of Semantic Web
technologies for biological science, translational medicine and health care.
Membership in the group has grown to 89 participants, with a wide range of
representation from industry and academia. The HCLS tutorial discussed the
challenges and opportunities at hand. An overview of the activities of the
each of the current task forces in HCLS was provided, along with a
description of how specific Semantic Web technologies are being applied.
Some new developments and the recent Face2Face meeting were also
discussed, as well as how interested parties can participate.
Speaker
Summary
Type of Contribution
Title
Speaker
Summary
Type of Contribution
Title
Speaker
Summary
Paper Presentation
Structuring the life science resourceome for Semantic Systems Biology:
lessons from the BioGateway project
Erick Antezana
Abstract: The application of Semantic Web technologies in the life sciences
for data integration is still nascent. We have recently built Bio-Gateway, an
RDF store that integrates all the candidate OBO Foundry ontologies with
other resources such as SWISS-PROT.
In the course of developing BioGateway, we faced challenges that are
common to other projects that involve large datasets in diverse formats. We
present a detailed analysis of the obstacles that had to be solved in creating
Bio-Gateway. In doing so, we demonstrate the potential of a comprehensive
application of Semantic Web technologies to global biomedical data. The
time is ripe for launching a community effort aiming at a wider acceptance
and application of Semantic Web technologies in the life sciences domain.
We make a public call for the creation of a forum that strives to implement a
truly semantic life science foundation of a type of Systems Biology that we
named Semantic Systems Biology.
Paper Presentation
Knowledge Representation for Web Navigation
Simon Juppe
Abstract: Representations of domain knowledge range from those that are
ontologically formal, semantically rich to those that are ontologically informal
Page 5 of 8
and semantically weak. Representations of knowledge are important in many
tasks, one of which is the support of travel around information spaces
through the identification and linking of concepts in a field. In this paper we
explore how representations of ontologically informal, semantically weak
domain knowledge as captured by the Simple Knowledge Organisation
System (SKOS) can enable a system to take advantage of the large number
of existing ontological representations to support semantic linking of Web
based information and thus facilitate information travel.
Type of Contribution
Title
Speaker
Summary
Type of Contribution
Title
Panel Chair
Summary
Invited Talk
Web 2.0 + Web 3.0 = Web 5.0? Using Ontologies to bring Web Services on
to the Semantic Web
Dr Mark Wilkinson
Mark is an Assistant Professor of Medical Genetics at the University of
British Columbia, in Vancouver. He is also PI in Bioinformatics at the Heart &
Lung Research Institute at St. Paul's Hospital. His primary research interests
relate to the construction and use of Semantic systems in the biomedical
domain, and in particular the role of mass-collaboration in the development
and maintenance of Semantic Web technologies and frameworks. He is
founder and leader of the BioMoby project and founder and leader of the
SHARE project.
He discussed the BioMoby project and how it opened his eyes to what the
Semantic Web could look like, and what mistakes were made along the way.
He then went on to discuss plans for the next generation of Moby Semantic
Web Services, where he attempts to make Web Service access completely
transparent, such that the "Deep Web" can be queried just like any other
Semantic Web resource.
Panel Discussion
If the Semantic Web is so good, how come most people use OBO for
ontologies and perl for data integration?
Dr Phil Lord
Panel members: Michael Krauthammer, M. Scott Marshall, Dave De Roué,
Mark Wilkinson
The panel discussed various pros and cons of Semantic Web technology in
the Life Sciences.
The question was raised whether after some 10 years of SW research there
have been demonstrable application successes in the field. The panel was of
the view that although there is still some way to go, much progress has been
made in developing applications that are useful to Life Science researchers.
Asked whether RDF stores are now sufficiently fast to effectively work with
large RDF datasets (an issue raised during an earlier presentation), the
panel answered that there are indeed very good and scalable RDF storage
solutions available.
With respect to what aspects of SW technology should be funded in the
future by research councils, the issue of Life Science Identifiers has been
specifically mentioned.
Page 6 of 8
Event Achievements:
The overall objectives of the workshop have been fully satisfied and in many respects the outcomes of
the event exceeded the original expectations of the the organisers.
The Call for Papers for this event resulted in 40 submissions:
• full papers: 24 submitted, 8 accepted for publication;
• poster papers: 15 submitted, 4 accepted for publication;
• demo papers: 1 submitted, 1 accepted for publication;
The Call for Participation resulted in 76 attendees, a considerable success, particularly considering
that the event has been organised for the first time, thus has no previous track record, and was a
stand-alone one-day workshop not co-located with any other larger bioinformatics conference. This
also compares favourably with other similar events in this domain. Participation extended considerably
beyond researchers associated with the work presented during the workshop. No doubt, the appeal of
high quality international invited keynote speakers has contributed to this.
Although we do not have exact statistics as to the distribution of participants' backgrounds, from the
discussion that took place on the day, it was evident that the event was attended not only by
Informatics experts, but also attracted scientists from the Life Sciences, which indeed was one of the
key objectives of the workshop.
Many participants were actively involved in practical discussions on implementation details, including
best tools, that demonstrated to give most effective outcomes, special limits of existing tools and first
interesting biological results.
The panel discussion was a good opportunity to discuss current limits and perspectives of the
adoption of Semantic Web technologies in Life Sciences, showing that we are quickly moving from a
pioneering era, where the real applicability of these technologies were investigated, to a more effective
and productive time, where the first tools are being implemented, starting to demonstrate the actual
benefits of this approach.
A feedback questionnaire has been returned by 13.2% of the attendees. The feedback confirms the
great success of the event. Specifically, all replies point out that the objectives of the attendees, which
include the gaining of new knowledge in the area, learning about new e-Research techniques, learning
about other areas of research, dissemination of information of own work, participation in an existing
collaboration, exploring opportunities for new collaborations and network, have been achieved. Also all
replies state that the event's contribution to their work was: 'more than expected' - 1 reply, 'achieved
expectation' - 3 replies, or was 'positive' - 5 replies. None of the replies described the contribution of
the workshop as either 'not as much as I had hoped' or 'disappointing'.
There were no specific future new research proposals or collaborations discussed as part of the
official programme, but it is evident from the questionnaire feedback that discussions on new
collaborations and networking has taken place during the event.
Due to the success of this event, a similar event, SWAT4LS 2009, is planned for next year.
Furthermore, a call for a special issue of BMC Bioinformatics based on SWAT4LS is currently being
prepared.
Any Other Observations:
The proceedings of the SWAT4LS workshop are publicly accessibly at:
http://www.swat4ls.org/proceedings/
(by using account: guest, password: guest2008).
Presentations given during the workshop are available from:
http://www.nesc.ac.uk/action/esi/contribution.cfm?Title=922
Page 7 of 8
The web site for this workshop can be found here:
http://www.swat4ls.org/ .
Access to this web page from 1 June 2008 - 3 December 2008:
Page 8 of 8
Download