Editing the eagle-i Ontology

Printed Documentation Table of Contents Protege ............................................................................................................................................ 1 About Protege ............................................................................................................................. 1 Download and Configuration...................................................................................................... 1 Editing the eagle-i Ontology ....................................................................................................... 4 Extract, Transfer, Load (ETL) ...................................................................................................... 13 Extract, Transform and Load (ETL) ......................................................................................... 13 Timeline ................................................................................................................................ 13 Pre-ETL Workflow ............................................................................................................... 13 Curation Workflow ............................................................................................................... 14 Multiple instances for the same field .................................................................................... 14 Ready for ETL ...................................................................................................................... 15 Post-ETL ............................................................................................................................... 15 Catalyst ......................................................................................................................................... 17 The Catalyst System at Harvard Core Laboratories ................................................................. 17 iii Protege About Protege The eagle-i curators use Protege edit and view the eagle-i ontology. You can view the Ontology using Protege V 4.0.2 but to edit, you must install Protege V4.1. Download the OBI- (Ontology for Biomedical Investigations) friendly Protege V4 for Windows: http://dl.dropbox.com/u/4905123/OBI-Protege41-PC-2010-12-21.zip -orDownload OBI-friendly Protege V4 for Mac: http://dl.dropbox.com/u/4905123/OBI-Protege41-Mac2010-12-21.zip You may also find the following Protege tutorials and documentation helpful: http://protege.stanford.edu/doc/users.html#tutorials http://protege.stanford.edu/doc/owl/getting-started.html http://protege.stanford.edu/doc/owl/getting-started.html http://protege.stanford.edu/doc/users.html#tutorials http://protege.stanford.edu/doc/owl/getting-started.html http://protege.stanford.edu/doc/owl/getting-started.html Copyright © 2011 The eagle-i Consortium. All rights reserved. Download and Configuration OBI Protege 4.1 is the standard editor for eagle-i ontologies. Step 1: Download Protege4 OBI version 1 Printed Documentation 1. Download OBI-friendly Protege4 for: Windows: http://dl.dropbox.com/u/4905123/Protege_Windows.zip or Mac: http://dl.dropbox.com/u/4905123/Protege41MAC.zip 2. Do not install udpates for the plugins, as the plugin versions contained in the download work specifically with OBI. Step 2: Configure Protege4 1. Start Protege 4 and open an ontology. Any ontology will work. 2. A dialog may appear asking you about imported OWL files. Click cancel. Step 3: Set Protege preferences 1. Open the File menu and click Preferences... 2. Click Renderer, and then click Annotations ..., Add 'en, en-US, !' in the languages column (Include the apostrophe.) OBI users may want to change "reasoner" default settings to improve reasoning performance. In the reasoner tab "initialization" , choose only "class members" and under "Displayed inferences" choose only "unsatisfiable classes", "equivalent classes" and "superclasses". No instance reasoning is done, but for most OBI development this isn't necessary. These settings can make a particularly big difference when using Hermit as the reasoner. Open Save to confirm that the 'Use XML Entities' is checked off. To display the class labels rather than the IDs, open Rendere and choose Render entities using annotation values. If the eagle-i classes still display as IDs, open the Renderer and click Annotations ..., then select label. Leave the language column blank and click OK. Click OK when you finish the setting. Step 4: Add the local files for ontologies you want to import. This step adds a set of directories from your local SVN repositories. This example uses the /datamodel/ directory. You need to locate the SVN local copy on your machine. 1 Select File > Ontology Libraries form the menu. 2 Click + located in the top left corner of the dialog window. When prompted, navigate to your local SVN directory and select: /datamodel/trunk/src/ontology/imports 2 Protege 3. Click + icon on the top left corner of the dialog window. You will be prompted with a finder/Explorer window. Navigate to your local SVN directory and select: /datamodel/trunk/src/ontology/external 4. Click + located in the top left corner of the dialog window. When prompted, navigate to your local SVN directory and select: /datamodel/trunk/src/ontology/external/iao 5. Click + located in the top left corner of the dialog window. When prompted, navigate to your local SVN directory and select: /datamodel/trunk/src/ontology/external/iao/external 6. Click + located in the top left corner of the dialog window. When prompted, navigate to your local SVN directory and select:: /datamodel/trunk/src/ontology/external/iao/protégé Your dialog window should look like the following (show me): 3 Printed Documentation 7. Close Protege. 8. Open Protege and open eaglei-app.owl You are ready to edit. Editing the eagle-i Ontology Intro: A unique feature of eagle-i is that the data collection and search tools are completely driven by ontologies. These ontologies are a set of modules that are written in the OWL language and edited and managed in Protege. The following guide will talk you through configuring Protégé, editing the core ontology module, and adding annotations to drive the user interfaces. 4 Protege [4:27:03 PM] Nicole Vasilevsky: Editing the eagle-i Ontology The following steps assume that you have a basic understanding of Protégé. There are seven basic steps when editing the ontology. Tasks to perform these steps are described in detail below. 1) Notify the curators that you are locking ero.owl and eagle-i-app.owl. 2) Update the ontology files in SVN. 3) Modify the Ontology and add new terms to IDs.xlsx Excel spreadsheet. 4) Classify the Ontology. 5) Save the project. 6) Commit to SVN and write out commit message. 7) Send an e-mail to the curators to notify them that you've released the file. Step 1 Notify curators Step 2 Open SVN and update open subversion E-mail curators and include the name of all of the files that you a will be working with. For example: [LOCKING] eagle-i-org-app.owl for editing. Step 2: Update IDs.xls (Change) update to: svn+ssh://orchestra.med.harvard.edu/svn/cbmi/u24/eagle-idev/datamodel/trunk/src/ontology this will update 3 files: the ids.xls. and ero.owl. eagle-i-app.owl and all relevant imports (update entire trunk) Open the IDS file so maybe change text a little to say "to add a new term" [4:39:17 PM] Melissa Haendel: because some will be edits and some will be new 5 Printed Documentation [4:39:20 PM] Melissa Haendel: and this is describing new 1. Find the IDS.xls file, typically located: SVN ->Data model -> src -> ontology -> IDs.xls 2. Enter the term and class in the following columns: ERO ID # A Mireot B choose next consecutive number nothing Label C Enter the name of the term Type D URI E type Class choose next consective 3. Step 3: Modify the Ontology 1. Open Protege. Replace the default URI with http://purl.obolibrary.org/obo/ero.owl. 2. From the Active Ontology tab, click Classes. If necessary, use the Protégé search box, located in the top, right corner of the screen to locate the parent term. 4. Make sure the correct Class is highlighted the, click Subclass to add a Subclass. When the dialog appears, copy and paste the ID (column A) from the Excel spreadsheet into the text box: “Create a new OWL Class” 5. Click OK. The ID appears as a label in the Annotations box. 6. Every annotation must be a datatype, String. To edit the annotation, double click and choose String from the list. 7. Add the term, using all lowercase. 8. Click OK. 9. Chose a definition- make sure it is IAO (hover over name). Chose Type: STRING 10. If necessary, add an alternate term. Chose Type: STRING. Copy and paste the definition you just selected. Make sure it is IAO (hover over name). 11. Enter your name in definition editor. 12. Click Save. Hover over new term and to confirm a valid URI. Best practice is to make only a few changes before committing to SVN. Keep detailed notes of the changes you make and add to commit message in SVN. And, always add a comment describing the changes you've made when committing. 6 Protege Select Reasoner > HermiT 1.3.2 from the menu. Relabeling preferred terms 1. Open http://eagle-i-org/ont/data on the toolbar 2. Use Search to locate the class you want to re-label. Note: Terms that are in bold are from the currently open ontology. Terms that are not bold, were imported from another file (ontology). 3. Create new annotation value. 4. If it isn't already selected, select Constant. 5. Select the preferred eagle-i label. 6. Click OK. 7. Type the name/label (Use capitalization as appropriate) 8. Click OK. 9. Save Step 6. Commit in SVN. 1 Refresh SVN. 2 Select CHANGES. Confirm that everything looks okay and there were no mistakes. 3 Select all red files and commit. Make sure to enter comments. Step: 7 Notify the curators that the file has been unlocked. Email team [UNLOCKING] ero.pprj 7 Printed Documentation Obsoleting a Term Before obsoleting a term it is best practice to check the usage of the terms across the repositories. You can do this using either SPARQL queries or ECDT tools. 1. Open Protege. 2. To see if the term you want to obsolete is used as domain and range for some property or some logical definition, select the term click "Usage" (show me). If it is, remember to replace the obsolete class with the new class at the end of the process or get rid of properties and logical definitions that will be obsoleted with the resource. 3. Rename rdfs:label as 'obsolete_old rdfs:label' (look at a pre-existing obsolete term for example). 4. Add annotation property 'reason for obsolescence' and enter a reason why this term is being obsoleted. 5. Add new parent term, 'OboinOWL:ObsoleteClass'. Click add named class. Search for the obsolete parent. 6. To remove the old parent term in the ontology, select the old parent and then click remove named class. 7. Look under the ObsoleteResource class and recheck the usage of the obsoleted class (#2 above) and fix any property or logical definitions that still include the obsoleted resources. 8. Remember to keep track of data using the obsoleted term, that will need to be migrated. Provide a migration (Excel) file for the build team. Step 4: Classify Ontology 1 8 Select Reasoning > Hermit 1.3.2. and then select Start Reasoner. (show me) Protege Committing to SVN 1) Before committing make sure to run a diff and check for the changes. 2) Write a detailed list of the changes that have been made in the commit message. Annotation Properties In the eagle-I app file we keep all the application specific properties and annotation. The purpose of these annotation properties and their values is to "flag" classes and properties to tell the application (for instance), which class is a Resource collected or which property is the one connecting a resource to a lab. Currently the following properties that can be used: - inClassGroup - inPropertyGroup There is a specific set of values for both inClassGroup and inPropertyGroup ( see https://sites.google.com/a/eagle-i.org/workspace/build-team-workspace/data-modelintegration/applicationspecificannotations for most current values) You can see all these values in the ontology by: 1) Selecting the eagle-i-app-ont ontology as the native ontology in Protégé from the dropdown of active ontology as shown below: 9 Printed Documentation 2) Click on the individuals tab and check each single value (note that there is comments explaining the usage and i the Types” indicates if value applies to classes or properties In order to flag a class or properties with a particular value: 1) Make sure you are adding annotation in the eagle-i-app ontology (see below) 2) Select the class or property form the class or property hierarchy that you want to annotate 3) Select the annotation tab on the right panel and click on add annotation 10 Protege 4) Select inClassGroup or inPropertyGroup from the list (depending what you are annotating) 5) Select ‘Individuals’ and ‘Entity URI’ form the right tabs as shown below: 6) Select the value of annotation from the list. And click OK. For the specifics of the values and meaning of current annotation values please refer to: 11 Printed Documentation https://sites.google.com/a/eagle-i.org/workspace/build-team-workspace/data-modelintegration/applicationspecificannotations 12 Extract, Transfer, Load (ETL) Extract, Transform and Load (ETL) Extract, transform and load (ETL) is a process that can be used to upload large data sets into the eagle-i system. Because of the many and divergent priorities of both teams, ETL is a work process that we must use judiciously and prioritize cooperatively. ETL requires the time of both the curation and build teams, regardless of file size or type; it should be reserved for large or complicated files that would be too time-consuming to enter manually. Timeline  From the day when ETL work begins, ETL will take a minimum of 5 business days to complete.  There is currently a backlog of files to ETL; files within this queue can be prioritized to meet data contributor/lab needs.  Time to ETL a file depends on several factors, including the type of the resources being ETLed (e.g. reagents, organisms), prior experience with that resource type, and ability of the current ontology to capture that resource type.  When discussing ETL with a lab, expectations should be managed appropriately to avoid disappointment or disengagement of the lab. Pre-ETL Workflow When a data contributor has a pledge for a file to ETL or the actual file to ETL, the software implementation team should be contacted via e-mail and provided with:  Lab/PI name  Approximate size/number of records (if known)  File format (CSV exports, Excel files are common and workable)  Any deadlines or dependencies for ETLing this data  Date when the file might be available, if it is just a pledge  Whether the lab will be annotating or editing these records after ETL If the file is urgent, the software implementation team will provide an estimate of ETL date within 1-2 business days. The software implementation team will contact the data contributor and others involved when the file is ETLed and ready for the data contributor/lab, if the ETL date changes, or if any problems with the data are encountered. 13 Printed Documentation Curation Workflow 1. Make note of the following from the pre-ETL workflow:  Any deadlines or dependencies for ETLing the data  Whether the lab will be annotating or editing these records after ETL 2. Check files for appropriate data and required fields:  If data is missing, obtain it from the web, Resource Navigator, or another resource.  Publications: enter the PubMed accession number (PMID; this can be found at http://www.pubmed.com)  Enter the title of the publication.  Enter the PMID number only (do not write “PMID: ###”).  If new terms are needed, write JIRA tickets for term requests. 3. Ensure proper formatting: Capitalize the first letter of words. Capitalize the first letter of first word in resource description, end with period. Move data to resource description if it does not match label in data tool (determine if data should be included). Migrate data into proper template (stored on SVN). Multiple instances for the same field If you have more than one instance for a field, separate them by semicolons. Note: If you use semicolons to separate data in the excel spreadsheet, the data will go into separate fields in the data tool. For example, if you have two techniques, such as Western blot analysis and Northern blot analysis, enter it in the Excel file as: Western blot analysis; Northern blot analysis If you have multiple instances that correspond across several fields, annotate them as shown in the example for construct inserts: Gene/Insert symbol 14 Gene/Insert Entrez Gene ID Insert Species Origin Extract, Transfer, Load (ETL) EFT2 promoter; Gald4(1- 851993; 855828; 147); Pho4(61-99) 850594 Saccharomyces cerevisiae; Saccharomyces cerevisiae; Saccharomyces cerevisiae These data will be entered into the repository as: 1) Construct_Insert_1 Insert_Symbol EFT2 Promoter Construct_Insert_1 Insert_Entrez Gene ID 851993 Construct_Insert_1 Insert Species Origin Saccharomyces cerevisiae 2) Construct_Insert_2 Insert_Symbol Gal4(1-147) Construct_Insert_2 Insert_Entrez Gene ID Construct_Insert_2 Insert Species Origin 855828 Saccharomyces cerevisiae 3) Construct_Insert_3 Insert_Symbol Pho4(61-99) Construct_Insert_3 Insert_Entrez Gene ID 850594 Construct_Insert_3 Insert Species Origin Saccharomyces cerevisiae Ready for ETL 1. Move the file to the “to-be-ETLd-in-draft-state” folder and move into appropriate subfolder (depending on which institution it should be ETLd into). 2. Send an e-mail to the ETL working group. Post-ETL Open the eagle-i data tool to ensure the data is uploaded into the correct fields. This is the list that should be used to confirm no dirty laundry is showing on our instance pages.  Currently (0.6.4 ontology) marked in the AdminData property Group (see here for more description of property groups) - curator_notes - lab_data_format - lab_inventory_system  Privacy sensitive data shouldn't be shown in the dissemination page and these are the ones that should be marked with a PropertyGroup_ContactLocation. These properties are the following: - contact_email - lab_delivery_address - mailing_address - phone_number 15 Printed Documentation  rdfs:comments should also not be shown on instance pages (as per recent bug http://jira.eagle-i.net:8080/browse/DTOOLS-635)  Do not publish the data before checking with the data contributor and software implementation team, as the lab may want to go back and validate the data.  Once the data has been validated by contributor and lab, the data can be published. For really large files, the resources can be published in bulk by the software implementation team upon request. Contact the software implementation team if you have any questions about the process or the status of a particular file. 16 Catalyst The Catalyst System at Harvard Core Laboratories The eagle-i repository now supplies all of the core facility data that is displayed in the Harvard Catalyst (http://cbmi.catalyst.harvard.edu/cores/index.html) database of core facilities at Harvard University. This includes core names, descriptions, addresses, affiliates, websites, and contacts. Catalyst also displays information about any instruments, software and services attached to each core. Curators should follow the same basic guidelines for Harvard cores as they would with any other core facility in another institution. But they should also be aware of the following Harvard-specific guidelines in order for the information to be compliant with Catalyst:  Organization Type: All organizations that should be included in the Catalyst system need to have the type “Core Laboratory.” This is because the Catalyst fields have been mapped to the fields in the eagle-i core laboratory resource form.  Website(s): the URL of the Catalyst record for a particular core should never be included in the eagle-i record’s “Website(s)” field. This is to prevent the Catalyst record from displaying a link to itself.  Lab address: the Catalyst location field is populated by the eagle-i “Mailing Address” field. Curators should ensure that if there is an address for the lab, it is contained in that field. The “Lab Delivery Address” field can also be filled in, but any different information there will not be displayed in Catalyst.  Institutional affiliations: the Catalyst institutions field is populated by the eagle-i primary “Affiliation” field. This field should only contain organizations from the following controlled list of approved Harvard institutions: o Beth Israel Deaconess Medical Center o Brigham and Women's Hospital o Broad Institute o Children's Hospital Boston o Dana-Farber Cancer Institute o Dana-Farber/Harvard Cancer Center o Forsyth Institute o Harvard Faculty of Arts and Sciences o Harvard Medical School o Harvard Pilgrim Healthcare 17 Printed Documentation o Harvard School of Public Health o Harvard Stem Cell Institute o Immune Disease Institute o Joslin Diabetes Center o Massachusetts General Hospital o Partners HealthCare Center for Personalized Genetic Medicine o Schepens Eye Research Institute Any other institutional affiliations should be populated only in the Secondary affiliation field, and will not be displayed in Catalyst.  Naming conventions: to distinguish multiple labs with the same or similar names, all Harvard core laboratories should include the name or abbreviation of their primary affiliate. If the affiliate is already a part of the official name, then nothing further is required. If it isn’t, the abbreviation should be appended to the end of the name in parenthesis, i.e. Name of Lab (Primary affiliate abbreviation). For example... Flow Cytometry Core Laboratory (BWH) BWH Transgenic Mouse Facility Click the Institutions link on the left sidebar on Catalyst to see a list of the current institutions and abbreviations used. In addition, there are some technical steps that need to be taken to keep the Catalyst interface up to date: 1. Before publishing a new core laboratory in the Harvard repository, eagle-i curators will need to contact the appropriate person at Catalyst so that he or she can assign a category for the core. For mapping purposes, eagle-i will need to provide the name of the core exactly as it appears in the eagle-i record as well as the eagle-i URI. 2. Any changes to the name on the eagle-i record will need to be communicated to the appropriate contact at Catalyst so that it can be manually updated in the Catalyst sidebar. (Note: this is a temporary workaround until the system can be synchronized.) 3. Before withdrawing or deleting an eagle-i core record, curators should let the appropriate Catalyst contact know so the core can also be withdrawn from the sidebar listing of core facilities. 18 Catalyst 4. The eagle-i ontology team needs to collaborate with the Catalyst team when making any changes to the ontology which will affect fields that are mapped to the Catalyst records. Copyright © 2011 The eagle-i Consortium. All rights reserved. 19

Editing the eagle-i Ontology

Related documents

Products

Support

Editing the eagle-i Ontology

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib