APPLICATION OF ONTOLOGIES IN CANCER NANOTECHNOLOGY RESEARCH SEMANTIC WEB ARTICLE Student: Andreea Buga Group: 1241E – FILS Coordinating Teacher: Maria Iuliana Dascalu APPLICATION OF ONTOLOGIES IN CANCER NANOTECHNOLOGY RESEARCH SEMANTIC WEB ARTICLE PART 1 Introduction Looking deeper to INTRODUCTION ontologies Problem statement and definition. Ontologies and LOOKING DEEPER TO ONTOLOGIES A brief presentation of ontologies and the way they can improve different processes and activities in working with data. ONTOLOGIES AND BIOMEDICAL RESEARCH The importance of ontologies in biomedical research and the actual area in which they work. biomedical research How nanotechnology, cancer research and ontologies work together Existing solutions HOW NANOTECHNOLOGY, CANCER Simple Use Case RESEARCH AND ONTOLOGIES WORK Applications TOGETHER Conclusions Bibliography EXISTING SOLUTIONS SIMPLE USE CASE APPLICATIONS CONCLUSIONS BIBLIOGRAPHY 1 INTRODUCTION We live in the century of information and speed. Every domain has its own knowledge stored and displayed to humans as data. The scientific progress and the development of informatics systems has led to a large amount of data that has to be processed daily. Life sciences are very important in the nowadays studies and latest discoveries are important steps to a better life in the future. Data mining and analyzing is a very important tool in understanding life processes and establishing new theories and setting results. One of the most important issues of our daily lives is finding a cure for the diseases that have started to spread and affect us more and more. Cancer research is one the directions of study that gained importance due to the impact of the solutions proposed. Data mining and analysis offers a better understanding of the causes, effects and treatments of cancer. But the large amount of data needed to be processed needs improved tools of classification, taxonomy and creating hierarchies. We will focus our attention on the cancer nanotechnology research – an interdisciplinary area using nanotechnology methods in the treatment, diagnosis and detection of cancer. The ontologies containing the specific vocabulary organized in a hierarchy provide the knowledge framework for the annotation, knowledgebased searching, data mining and interpretation and diagnosis. LOOKING DEEPER TO ONTOLOGIES There are several ontologies developed for cancer nanotechnology research that try to structure and efficiently use the information obtained from patients, previous studies, and analysis. But what is an ontology more exactly and how it can be used in such applications? “An ontology may take a variety of forms, but necessarily it will include a vocabulary of terms, and some specification of their meaning. This includes definitions and an indication of how concepts are inter-related which collectively impose a structure on the domain and constrain the possible interpretations of terms."1 Ontologies have been used in collaborative – research and working with databases in several ways. For example, ontologies can provide specific terminology used in a certain area by specialists and computers; they can provide semantic sharing and integration of data gathered, create the logical connections between it and allow the later search, retrieval and diagnosis with the aid of the data stored. Inferences have an important role in assuring the quality of the logical connections. ONTOLOGIES AND BIOMEDICAL RESEARCH Biomedicine is an area containing a large vocabulary of specific terms related to diseases, symptoms, equipment, treatment, and diagnostics. Organizing the available knowledge has an important role in processing the data, trying to find analogies, propose treatments, and make inferences on the available data. The following definition offers a better view: 1 2 M. Uschold, M. King, S. Moralee, and Y. Zorgios. The Enterprise Ontology. ,The Knowledge Engineering Review, 13(1):3189, 1998. “In biomedical research, ontologies are used to represent the knowledge of a specific domain of interest in machine-processible form and to integrate experimental data that is annotated with terms from these ontologies.”2 HOW NANOTECHNOLOGY, CANCER RESEARCH AND ONTOLOGIES WORK TOGETHER “Nanotechnology involves the application of scientific knowledge from a variety of disciplines in science and engineering to understand, manipulate, and control the properties of matter at nanoscale (1-100 nm) size dimensions.”3 Nanotechnology solutions have some advantages that may overcome the problems faced by the conventional engineering in cancer treatment and research. One of the main problems of the normal size approaches is the fact that they cannot be very accurate in working with cancer cells (that are very small). Cancer cells have common characteristics with healthy cells and the lack of specificity of drugs may affect healthy cells. The nanomaterials used in cancer research are called NP-CDTs and will be further on referred like this. Food and drug administration has introduced in 2005 a treatment based on NP-CDTs for metastatic cancer and several other such treatments are being tested on clinical trials. Informatics methods are considered to be useful tools in the advancement of nanotechnology cancer research. NP-CDT are very diverse and may have a wide large of applications, as studies revealed. The diversity is offered by the large number of interactions that may change the chemical composition. Therefore, making a small change in the chemical properties of such a material will lead to generating new medicine data sets. The preclinical evaluation of such a NP-CDT needs a lot of experimental characterization, which also generates information. Even though the number of NP-CDT data is much smaller than the genomic data, the richness, therapy and diagnostic relevance of NP-CDT data leads to a combinatorial complexity exceeding genomic data. Based on the available data, inferences and searches can be done in order to see the problem existent with some of the NP-CDT treatment; how it can be reformulated so that it would have a benefic effect. “Informatics approaches are likely to be valuable for such reformulation efforts, especially if one has access to an integrated resource that yields rich information regarding both the physicochemical and functional properties of NP-CDTs, as well as tumor physiological properties.”4 Ontologies will have an important role on the development of databases containing information about cancer and nanotechnologies, on building inferences and helping researchers to better understand the relations between the diagnosis methods, the cancerous cells, and the applied treatment. 2 Dennis G. Thomas, Rohit V. Pappu, Nathan A. Baker, Journal of Biomedical Informatics 44 (2011) 59–74,NanoParticle Ontology for cancer nanotechnology research, February 2011 3 3 Nanoscale Science, Engineering and Technology Subcommitee, Committee on Technology, National Science and Technology Council. The National Nanotechnology Initiative Strategic Plan. 2004. 4 Dennis G. Thomas, Rohit V. Pappu, Nathan A. Baker, Journal of Biomedical Informatics 44 (2011) 59–74,NanoParticle Ontology for cancer nanotechnology research, February 2011 EXISTING SOLUTIONS caNanoLab Project stores, searches and shares data generated from characterization studies of nanomaterials used in cancer research. But this database needs a specific vocabulary that will allow the connection with other cancer related databases and data sharing. National Cancer Institute Thesaurus with other organizations developed few terminologies, but this is only the beginning of a large vocabulary that has to be created. Some existing vocabularies (from bioinformatics, genomics, cancer medicines) can be used to define terms needed for this area of expertise, but a specific vocabulary for cancer nanotechnologies does not exist. Ontologies and their machine-interpretable structure will open the path for communication between researchers from different fields, will ensure semantic interoperability between applications and databases, provide new analytical tools. We need such ontologies to represent knowledge that can be used for data integration, knowledge-based search, drawing inferences, making classifications. SIMPLE USE CASE Consider the following scenario as an example: a chemist has synthesized a dextran-coated nanoparticle, but wants to make a rough prediction about its in vitro and in vivo properties. So, s/he plans to compare it with nanoparticles that have characterization data available in a database such as caNanoLab. To make the best predictions, the researcher must identify that nanoparticle which most closely correlates with the dextran-coated nanoparticle. For this, the researcher must know what descriptors to choose from for comparing the nanoparticles, and also know the optimal descriptors needed to help determine the type of nanoparticle that highly correlates with the dextran-coated nanoparticle. These descriptors can be provided by the ontology. At the simplest level, if the descriptor is type of coating material, then by classifying nanoparticles based on this type of coating material will help identify the highly correlated classes of nanoparticles that are either the sibling classes or child classes of dextran-coated nanoparticles. In this way, the researcher only needs to look at nanoparticle data annotated with the ontology classes, and to compare results of the different nanoparticles identified from the classification in the ontology. Better predictive models such as Structure Activity Relationship (SAR) models can be developed using data annotated and integrated by an optimized set of ontology-based descriptors for every case in question. PROPOSED SOLUTION The paper will analyze the solution to develop an ontology for nanoparticle (NPO) proposed by D. Thomas, R. Pappu and N. Baker in their article. For the beginning, a specific list of terms used in nanotechnology to describe NP – CDTs is created. From this list of terms and their definition one can notice the complexity of the classes defined and related in the ontology. After choosing the terms and defining them, the following conclusion regarding a nanoparticle structure was drawn: 4 “In general, a nanoparticle formulation consists of chemical components that can be enumerated as 1) nanoparticles, 2) active chemical constituents, which are part of the chemical makeup of the nanoparticle, and 3) active chemical components which functionalize the nanoparticle. There can be one or more types of nanoparticle in a nanoparticle formulation, depending upon the nanoparticle structure, function or chemical composition. All of the chemical components can be described by their molecular structure, biochemical role, or function. Besides enumerating and describing the chemical components, one needs to describe the types of chemical linkages (e.g., amide linkage, disulphide linkage, encapsulation, etc.) that exist between them, and thereby provide an overall qualitative description of the chemical composition in a nanoparticle formulation. Other descriptions include: physical locations of chemical components in a nanoparticle (e.g., core, surface, etc.);shape of the nanoparticle (e.g., spherical, cylindrical, etc.); physical state of the formulation (e.g., emulsion, hydrogel, etc.); physical, chemical or functional properties of the active chemical components / constituents (e.g., organic, hydrophilic, magnetic, etc.); intended functions and applications of chemical components for cancer treatment, diagnosis and therapy; underlying mechanisms guiding the design of the nanoparticle formulation (e.g., endocytosis, active targeting, etc.), and; type of stimulus that may be required for activating nanoparticles (e.g., magnetic field, ultrasound, pH change, etc.) and the nanoparticle's response to the stimulus (e.g., drug release from nanoparticle in response to magnetic field stimulus, heat generation from nanoparticle in response to stimulus of infrared light, etc.).”2 Nanoparticle representation in NPO The ontology has been built on the fundamentals of Basic Formal Ontology (BFO) framework and implemented in OWL using well-defined ontology principles from which we remember: 5 1. Principle of unbiased representation: Following BFO design principles, any term in the ontology should represent an entity as known in reality and not represent it from the biased view of an individual. 2. Principle of asserted single “is a” inheritance: Again following BFO principles, each term should have no more than one parent term in the asserted OWL hierarchy. This principle offers the advantages of making the ontology easily extensible and interoperable with other ontologies that have a formal structure. 3. Principle of inferred multiple “is a” inheritances: Multiple parent-child relationships for a term are not present in the asserted hierarchy. However, a term can have more than one parent in the inferred hierarchy that is constructed by invoking an appropriate “OWL reasoner” on the asserted hierarchy. Rules for inferring these relationships are expressed using OWL description logics and specified as OWL necessary and sufficient or necessary conditions, in the ontology. The OWL reasoner uses the OWL expressions to create the inferred hierarchy. 4. Preferred name and textual definition: Every OWL class and OWL property (object, datatype) must have a preferred name and a textual definition using the NPO's OWL annotation properties: “preferred name” and “definition”. 5. Synonym: If a class or OWL property has multiple names, these names must be provided as synonyms using the NPO's “synonym” OWL annotation property. 6. Code: Every class must have an identification code that starts with the prefix “NPO_” (e.g., NPO_100) 7. Rdf ID and rdf:Label: Every class specifically defined in the NPO must have its NPO code as its rdf:ID. The rdf:ID of every class borrowed from an external ontology found in the OBO Foundry list, must be preserved in the NPO. Every class in the NPO must also have its preferred name as its rdf:Label. The obtained results was an ontology having 1564 classes, 45 object properties specifying class – level associations and 5 OWL annotation properties (definition, synonym, code, preferred name, dBXreflId). All the domain – specific entities are classified under the BFO classes, Entity being the top- - most class. There are also independent entities that refer to a Nanomaterial, MolecularEntity, Instrument, Material Site, Material Boundary and also entities related to the biological processes: Molecular Function, Nanoparticle Response to Stimulus, Tumor Targeting. We can notice the great number of entities involved in defining this ontology that appear due to the vast interaction of various domains. The following image shows an inference parentchild relationship between different particles: 6 A wider view of the ontology is given by the following hierarchy related of “nanoparticle” formulation in NPO: 7 APPLICATIONS It is clear that the ontology serves as an important tool in cancer nanotechnology research: diagnosis, treatments, analysis. But the application domain is wider than we can imagine. There are numerous publications and journals needed by researchers. Search results may be irrelevant and may lead to a time waste for the scientists. Such an ontology solves the search issues. NPO provides the needed terminology and enlarges the search possibilities (synonyms, search by topics, associations and relations based on the ontology). Therefore, the search can be done with knowing the details from the cancer nanotechnology area and increases inter-domains operability. Data indexing, retrieval and integration can be done using NPO annotation. This part is an important step in data mining and knowledge discovery and will be essential in future research progress. CONCLUSIONS The aim of this paper is to present the advantages of an ontology developed for cancer nanotechnology research. The ontology is founded on the basis of BFO and is implemented in OWL. Knowledge embedded in this ontology is related to chemical proposition, properties, preparation of nanomaterial and it is also related to other cancer research databases. We have seen the importance of the ontology in establishing connections with other domains, making inferences and helping the scientists develop their current research. Knowledge-based search, logical connections, semantic integration and data mining are the first steps in future technology and they may be a key factor in the discovery of new treatments, predictions and studies related to cancer. BIBLIOGRAPHY M. Uschold, M. King, S. Moralee, and Y. Zorgios. The Enterprise Ontology. ,The Knowledge Engineering Review, 13(1):31-89, 1998. http://www.w3.org/standards/semanticweb/inference, Inference, 09.05.2013 – 11:38 AM Dennis G. Thomas, Rohit V. Pappu, Nathan A. Baker, Journal of Biomedical Informatics 44 (2011) 59– 74,NanoParticle Ontology for cancer nanotechnology research, February 2011 http://www.nano-ontology.org/; 09.05.2013 -1:37 PM Data Mining in Cancer Research, Paulo J.G. Lisboa, Liverpool John Moores University, UK, IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, FEBRUARY 2010 8