CHEMICAL MANUFACTURING ONTOLOGY Chad Stahl NOVEMBER 26, 2013 UNIVERSITY AT BUFFALO IE 460/500 Special Topics Table of Contents Introduction .............................................................................................................................................. 2 Purpose & Resources ................................................................................................................................ 2 Industry Class ............................................................................................................................................ 3 Molecular Entity Class ............................................................................................................................... 4 Product Class ............................................................................................................................................. 5 Product Class Example .............................................................................................................................. 7 Quality Standard Class .............................................................................................................................. 8 Object Properties .................................................................................................................................... 10 Issues and Solutions ................................................................................................................................ 12 1 Introduction The chemical manufacturing industry is one of the largest industries in the modern world, accounting for nearly $3 trillion in sales annually. The United States and European Union are the largest producers of chemicals, which are used for a myriad of different purposes; containing but not limited to: consumer goods, agriculture, manufacturing, construction and other applications in the service industries. This industry converts many natural raw material, such as: oil, natural gas, air, water and minerals, into over 70,000 different products and applications. With such a wide variation of potential outputs and the varying inputs of this industry, Ontological classification of products from this industry is essential for the purpose of quickly identifying important or critical elements of the manufacturing or supply chain. As such, the scope of this project is the development of a basic Ontological classification of major sections of the chemical industry, based on value and/or volume of products produced or consumed. Purpose & Resources The purpose of this ontology project was to create the first ever, to my knowledge, ontology based on the chemical manufacturing industry as a means to create an organized and universal format for the classification of products. An existing Chemical Manufacturing Ontology does not exist in this regard, and as such the majority of classes introduced are pulled from an existing source (Wikipedia entry of the Chemical Industry and subsequent sub-industries). However, some class elements were derived from the Chemical Entities of Biological Interest Ontology (also denoted as the ChEBI Ontology) in order to provide acceptable definitions, synonyms and chemical/biological abbreviations. ChEBI is a freely available dictionary of natural or synthetic ‘small’ chemical and biological molecular entities, which are used to intervene in the processes of living organisms. The ChEBI Ontology provided some insight into chemical structuring of certain elements which other sources were unable to provide, allowing for far more accuracy in the created Ontology. Without the support provided by the ChEBI Ontology, much of the work done on the Chemical Manufacturing Ontology would not have been possible. Another source which was critical to the manufacture of the Chemical Manufacturing Ontology is the Basic Formal Ontology (BFO), developed by Dr. Barry Smith and Pierre Grenon. BFO is an upperlevel ontology used for supporting information retrieval, analysis and integration. The Basic Formal Ontology was extremely valuable in the creation of the Chemical Manufacturing Ontology for the aid in cataloguing of the chemical manufacturing classes and the pre-existing structure which is provided in BFO. The creation of the Chemical Manufacturing Ontology relied heavily on BFO for the aforementioned reasons, and I am grateful to Dr. Smith for providing information on BFO. The Chemical Manufacturing Ontology utilized the aforesaid ChEBI Ontology and BFO in order to classify and provide better information about certain chemical and/or biological entities. However; the Chemical Manufacturing Ontology utilized information from other sources, such as webpages, as well as these existing ontologies, ensuring the individuality of this ontology project. 2 Industry Class Shown in Figure 1 below, all of the major classes (Industry, Product, Role, Quality Standard and Molecular Entities) are shown inside of the existing upper-level BFO classes. The Industry class contains sub-classes of all chemical industries potentially producing or involved in the production of an item in the Product class. These product classes utilize several Object Properties (such as manufacturer_of, manufactures, and uses) in order to link these Industries to the Product classes which they produce or utilize. These Chemical Industries can also be linked to the Quality Standards, listed under the general Quality class, which they use for select Products. An example of this: the Quality Standard class ‘Rx_360Standard’, which is ‘used_by’ (Object Property) the HealthcareIndustry’ sub-class, PharmacologicalIndustry class. The same general approach can be taken for the ‘manufacturer_of’ and ‘manufactures’ Object Properties, allowing for an in-depth analysis of the chemical manufacturing industry and products relevant to said industry. Figure 1 – Different Chemical Industries listed in the Ontology. Obviously some industries have product inputs in order to produce their desires outputs, an example of this would be the Industrial Gas products which are used in order to produce products for the Petroleum Industry. This would be an instance in which the ‘uses’ and its inverse ‘usedIn’ Object Properties comes into play, indicating which products are applied to which industries. Some Industries may need to be added into this existing framework if need arises, but this should be a relatively easy process. 3 Molecular Entity Class The next major class in the Chemical Manufacturing Ontology is the Molecular Entity class, which deals with the molecular traits which a particular Product may have. The only currently existing sub-class of Molecular Entity class is Ion, in regards to the Ionic traits of an entity: Anion (one or more elementary charges of the electron), Cation (one or more elementary charges of the proton) and Zwitterion (a neutral molecule with a positive and negative electrical charge). Further items may be added to this class in the future, but in regards to the current product selection, the current listing is adequate. Show below in Figure 2 is the expanded form of the Molecular Entity class: Figure 2 – Showing the expanded class hierarchy of the Molecular Entity class. The Molecular Entity class was based on the class of the same name in the CheBI Ontology, including definitions and other annotation and description information. While only basic Ion classes were used, it would be this paper-s recommendation to incorporate other Molecular Entity subclasses from the CheBI Ontology if possible. There are a wide range of possible classes which may be needed once more complex chemical entities are added to the ontology, or a further depth of existing or new classes is needed. However as stated before, only those classes which were needed for the basic completion of this ontology project were added thus far. 4 Product Class Perhaps the most important and largest portion of the Chemical Manufacturing Ontology is that of the Product class, with several dozen products listed currently in the ontology but room for potentially every chemical product a user may wish to add. As the name denotes, this class contains the multiples types and variations of chemical products which are created via the potential manufacturers listed in the Industry class. The Product class contains definitions, synonyms, references and other information about a myriad of chemical products ranging from basic acids and soaps to specialty agricultural and cleaning chemicals. An example of a small section of the Product hierarchy is shown in Figure 3, detailing the complete class-path for a particular branch (this branch ending with the subclasses of FertilizerProduct), with the Annotation information detailing the specifics of the OrganicFertilizerProduct class on the right. The basic distinction between the four major types of chemical products deals with the use of each product group (such as who will be using the end product), as well as the inputs that a particular product has for its creation. For example, any product which will usually only be provided to the general public as a standard consumer good (such as things you may find at a supermarket: soaps, detergents, vitamins, etc.) would be classified under the ‘ConsumerProduct’ class. These consumer products may have chemical inputs from another class, such as the ‘BasicChemicalProduct’ class or the ‘SpecialtyChemicalProduct’ class, in order to from the necessary products which can be purchased by any consumer. Alternatively, products which are found in the ‘SpecialtyProductClass’ are (as the name suggests) highly specialized for use in a very specific industry. Examples of this include products of the surfactant industry, which would be found under the sub-class ‘SurfactantProduct’ and generally only produced by this specialized industry, though the end product could be used as an input for a wider range of product classes. This sort of format applies to the remainder of the Product class of the ontology, with products broken up among the four main groups and end products of each group being utilized in the creation of others. The last group of products all fall under the LifeScienceProduct class, classifying products which are provided to major industries dealing essentially in the creation or extension of life (whether it be plant life, pet life or human life). These products can range from simple pesticides and other agricultural chemicals, to very complex products used in veterinary or healthcare industry applications. This class can be difficult to accurately include certain products into, due to a relaxed definitions of what exactly qualifies and therefor some overlap with other industrial sectors (mainly Specialty and Basic Industrial sectors). 5 Figure 3 – The complete hierarchy for a sub-section of the Product class. Annotation information (label, definition, synonym, etc.) are listed on the right. Figure 4 – Annotation and Description information for the Organic Fertilizer Product class. 6 Product Class Example As the Product class is by and large the most important class in the ontology (reflecting the outputs and inputs to the Chemical Manufacturing Industry, therefore being it’s most important aspect), it is important to understand how exactly these products can be reflected in the ontology. An example of how any given product class could be used effectively, in addition to elements from other classes (different products, industries and object properties), can be shown with the Product sub-class ‘Ethanethiol’ and the implications are easy to apply to a myriad of other Product sub-classes. Ethanethiol is a chemical additive to certain petroleum industry products, particular Liquefied Petroleum Gas (commonly known as propane or butane, and reflected in its own Product sub-class ‘LiquefiedPetroleumGasProduct’). Its purpose is that of an aroma compound, allowing for leaks of LPG gasses to be detected before any damage due to toxicity occurs. Therefore, here we have one product which serves entirely as an input to another product, used in some specific industry and application, which can easily be reflected utilizing all of the existing resources created in the Chemical Manufacturing Ontology. So when looking to reflect this into the ontology, many of the different classes and object properties come into play. To start, Ethanethiol can be assumed to be some aroma compound and as such it would fall under that class, as its sole purpose is as an additive to some chemical compound in order to serve as a warning. Next we would use the Object Property ‘usedIn’ and create the expression “usedIn some LiquefiedPetroleumGasProduct”, reflecting the use of Ethanethiol in LPG as its aroma. When viewing the ‘Ethanethiol’ class in the Protégé builder, it will also show that it uses two other chemicals as its own inputs (these being ethylene and hydrogen sulfide, which are used together in order to create ethanethiol) and this is reflected by the ‘uses’ expression. From here, through anonymous ancestor assertions, you can see how the Product class ‘Ethanethiol’ is important down the line as an additive to LPG. It is an important aspect in some cooking role and it is used in some food service industry, as both of these reflect the uses of LPG as a fuel in said industry. A role could also be created entirely for products similar to ethanethiol, such as a role for aroma compounds used entirely as a warning for dangerous chemical gasses, although this ontology did not as this was the only such product listed with that application. The same basic format can be applied to many of the chemical products listed in this ontology, as the majority of products created in non-consumer roles typically serve as inputs to manufacturers whose products will cater to some consumer market somewhere down-stream in the production process (creating a complicated system of inputs and outputs). 7 Quality Standard Class The final critical section of the Chemical Manufacturing Ontology revolves around object and molecular qualities, covered under the broad-ranging and fittingly titled ‘Quality’ main class. This class contains the general quality and quality standards utilized for defining or measuring the condition of some particular thing. These qualities can then be tied to potentially every product listed in the ontology through a pre-defined Object Property (set in the Object Property Hierarchy tab, which will be reviewed further into this paper). The expanded class hierarchy of the Quality class is shown below, in Figure 4, as well as the Annotation information of a member of the Quality Standard class (the Rx-360 Standard used in the Pharmaceutical Industry) in Figure 5. The Chemical Manufacturing Ontology is slightly lacking in sub-classes of the different Quality classes, and it would be likely that anyone more knowledgeable in their chemical field would add different quality standards. However, the basic structure could remain unchanged and be used as the outline for any user desiring to add more detail to this class: Figure 4 – The hierarchy of the Quality class, fully expanded. 8 Figure 5 - The Annotation and Description information for the Rx-360 Standard class, displaying the usedcase of this particular quality standard. 9 Object Properties The next step in the creation of the Chemical Manufacturing Ontology was to create several Object Properties which can be applied to several Products which have either a known manufacturer or quality/quality standard. In order to determine which Object Properties were needed to adequately cover each Product, several of these properties were taken from the ChEBI Ontology while a few were added as-needed which were not covered in ChEBI. Figure 5 below shows the complete class hierarchy for all of the Object Properties, with any Annotation information for the selected Object Property shown on the right: Figure 5 – Showing the complete class hierarchy of the Object Properties, with Annotation information for the selected class (has_manufacturer) shown on the right. Description information (such as inverse properties) are shown on the bottom-right. In most cases, the ‘uses’ and ‘usedIn’ referred to products which were either used or used in some particular industry or other product, instead of using some other Object Properties to denote this. The hasRole and hasQualityStandard classes are rather self-explanatory, as well as the inverses of these classes (qualityStandardOf). No annotation or description information were added to these Object Properties, as the format they were taken from was mainly either BFO or CheBI, both of which do not have any other information listed in this regard. It should be noted that in the description of each Object Property class, if a class did have an obvious inverse (as shown above with ‘hasQualityStandard’ and ‘qualityStandardOf’ properties) then it was included in the description “inverse Of”. 10 The Chemical Manufacturing Ontology is a thorough analysis of the chemical manufacturing industry, products which are created through their processes, as well as the quality standards which are applied to these products and molecular qualities which certain products have. Major product branches are explored throughout the Chemical Manufacturing Ontology, allowing for potential future users (such as companies involved in said industry) to fully annotate their product line using the basic structured format provided. Limitations of the Chemical Manufacturing Ontology are mainly regarding the class hierarchy and structure, some smaller but more specialized chemical industries may not be directly included into the current ontology, but could be added as a sub-class of some existing chemical industries or manufacturers. Fortunately, the majority of the more general chemical industries and product categories already exist in this project (albeit the classes may be lacking in annotation information; such as definition, synonym, etc.) and so the addition of specialized sub-classes dependent upon users’ needs should not be difficult. The coverage on the Chemical Manufacturing Ontology was measured in the accuracy and inclusiveness of the Product class, as the largest and (arguably) most important piece of the project. As most of the product branches were taken from an online Wikipedia source, it was best to compare this product selection against some pre-existing source. In this case, chemistry books were used in order to cross-reference the selection of products in the Chemical Manufacturing Ontology to that in the index of the books. If a product was found that could not be classified either through an existing class or the creation of a sub-class for an existing class, then the Ontology was known to be incomplete. The definition of each product and class could also be cross-referenced in this way, to ensure the online Wikipedia and dictionary sources were accurate in their information. Obviously not every product could be included at this stage of the project, but there should be at least a way of adding every chemical you encounter, either through a sub-class or superclass. Then if a user of the ontology wanted to add their own product or chemical to the listing, this should be a relatively easy and painless process. 11 Issues and Solutions There were some issues the approach taken in classifying the four various branches of chemical industry products, mainly issues dealing with multiple inheritance in certain instances. Some chemical industry products are eligible for several different groups in the ontology at once, creating confusion as to which group most accurately encompassed said product. As an example, products like surfactants (soaps) and detergents have both Industrial and Consumer applications, as these are products that are used in both a down-the-stream manufacturer after production as well as sent directly to the mass consumption market (although likely in different grades). There were some possible ways to resolve this issue, both of which would require major work to the ontology and were not included at this point. The first potential solution was to re-order every product into either an Industrial grade or a Consumer grade, reflecting the differences in product concentration which are prevalent in the chemical manufacturing industry. This could be an easier solution to implement, if only breaking down products with multiple inheritance in this way (however would create an inconsistency if every product was not labeled this way, multiple inheritance or not). The second solution would revolve more around the CheBI Ontology approach, with the removal of very generic terms (i.e. Surfactant, Detergent) and instead replacing these with instances of each product. These instances would then be assigned to some Role (such as ‘has_role some Soap’ or ‘has_role some Detergent’), in order to remove any cases of multiple inheritance. This route would require a nearly complete overhaul of the existing ontology, but is still an effective option if multiple inheritance was becoming problematic while running the Reasoner. In the end, neither option was implemented, but it is my hope that if an outside source were to take up this ontology project that they may implement either of these suggestions. 12