B.3.1. J.F. Aldana Montes, R. Berlanga

advertisement
An Initial Pilot Experience on Generating Complex Ontology
Instances from Scientific Bibliographies on Real Biological Domains
José F. Aldana Montes(1), Rafael Berlanga-Llavorí(2), Roxana Danger(3), Raul Montañés-Martínez(4),
Mª del Mar Rojano-Muñoz(1), Francisca Sánchez-Jiménez(4)
(1) Khaos Research Group. Department of Computer Languages and Computing Science.
Higher Technical School of Computer Science Engineering. University of Málaga.
Campus de Teatinos. 29071 Malaga. Spain.
(http://Khaos.uma.es; jfam@lcc.uma.es)
(2) University of Oriente, Santiago de Cuba, Cuba
now at TKBG
roxana@csd.uo.edu.cu
(3) TKBG. Department of Computer Languages and Computer Systems. Universitat Jaume I, Castellón,
Spain
(http://www3.uji.es/~berlanga/; berlanga@uji.es)
(4) ProCel Lab. Department of Molecular Biology and Biochemistry. Faculty of Sciences. University of
Málaga. Campus de Teatinos. 29071 Malaga. Spain. (http://www.bmbq.uma.es/procel; kika@uma.es).
Abstract
In this paper we present a first insight into the generation of complex ontology instances from scientific
bibliographies like the one in PubMed/PubChem on a real biological domain: Polyamines and Histamine.
There is evidence for the involvement of both in cancer and other inflammation- and/or angiogenesisdependent diseases, but multiple questions concerning the molecular processes behind these effects still
remain to be solved.
Both histamine and polyamines have similar chemical structures and metabolic pathways. Furthermore, in
several relevant physiological or pathological situations both histamine and polyamines are present and,
indeed, there is some degree of cross-talk between them. Unfortunately the available data is widely
dispersed throughout the specialized literature of very different areas of biomedicine
We address the problem of automatically generating ontology instances starting from a collection of PDF
documents stored in a bibliographic database. Given a domain ontology, which models and describes what
we are searching for, the structure of the document is extracted in order to generate a mapping between the
ontology and the document text. Using this mapping the ontology is populated with the extracted knowledge.
We adopted the Histidine decarboxylase (HDC), which is the enzyme responsible for histamine synthesis, as
the pilot molecule to contrast our knowledge extraction efforts because we have worked extensively on its
expression, turnover and structure-function relationships, and have developed both the first threedimensional model and the first review on this enzyme. Nevertheless, due to the nature of the protein
metabolism, once the instance generation process is validated on this enzyme, it could be easily scaled-up to
any other enzyme, related or not to amine metabolism.
Keywords
Semantic Web, Ontology population, Instance Extraction, Biogenic Amines, Histidine
decarboxylase
Download