Printed Documentation
Table of Contents
Protege ............................................................................................................................................ 1
About Protege ............................................................................................................................. 1
Download and Configuration...................................................................................................... 1
Editing the eagle-i Ontology ....................................................................................................... 4
Extract, Transfer, Load (ETL) ...................................................................................................... 13
Extract, Transform and Load (ETL) ......................................................................................... 13
Timeline ................................................................................................................................ 13
Pre-ETL Workflow ............................................................................................................... 13
Curation Workflow ............................................................................................................... 14
Multiple instances for the same field .................................................................................... 14
Ready for ETL ...................................................................................................................... 15
Post-ETL ............................................................................................................................... 15
Catalyst ......................................................................................................................................... 17
The Catalyst System at Harvard Core Laboratories ................................................................. 17
iii
Protege
About Protege
The eagle-i curators use Protege edit and view the eagle-i ontology. You can view the Ontology using
Protege V 4.0.2 but to edit, you must install Protege V4.1.
Download the OBI- (Ontology for Biomedical Investigations) friendly Protege V4 for Windows:
http://dl.dropbox.com/u/4905123/OBI-Protege41-PC-2010-12-21.zip
-orDownload OBI-friendly Protege V4 for Mac: http://dl.dropbox.com/u/4905123/OBI-Protege41-Mac2010-12-21.zip
You may also find the following Protege tutorials and documentation helpful:
http://protege.stanford.edu/doc/users.html#tutorials
http://protege.stanford.edu/doc/owl/getting-started.html
http://protege.stanford.edu/doc/owl/getting-started.html
http://protege.stanford.edu/doc/users.html#tutorials
http://protege.stanford.edu/doc/owl/getting-started.html
http://protege.stanford.edu/doc/owl/getting-started.html
Copyright © 2011 The eagle-i Consortium. All rights reserved.
Download and Configuration
OBI Protege 4.1 is the standard editor for eagle-i ontologies.
Step 1: Download Protege4 OBI version
1
Printed Documentation
1. Download OBI-friendly Protege4 for: Windows: http://dl.dropbox.com/u/4905123/Protege_Windows.zip or
Mac: http://dl.dropbox.com/u/4905123/Protege41MAC.zip
2. Do not install udpates for the plugins, as the plugin versions contained in the download work specifically with
OBI.
Step 2: Configure Protege4
1. Start Protege 4 and open an ontology. Any ontology will work.
2. A dialog may appear asking you about imported OWL files. Click cancel.
Step 3: Set Protege preferences
1. Open the File menu and click Preferences...
2. Click Renderer, and then click Annotations ..., Add 'en, en-US, !' in the languages column
(Include the apostrophe.)
OBI users may want to change "reasoner" default settings to improve reasoning performance. In the
reasoner tab "initialization" , choose only "class members" and under "Displayed inferences" choose
only "unsatisfiable classes", "equivalent classes" and "superclasses". No instance reasoning is done,
but for most OBI development this isn't necessary. These settings can make a particularly big
difference when using Hermit as the reasoner.
Open Save to confirm that the 'Use XML Entities' is checked off.
To display the class labels rather than the IDs, open Rendere and choose Render entities using
annotation values. If the eagle-i classes still display as IDs, open the Renderer and click Annotations
..., then select label. Leave the language column blank and click OK.
Click OK when you finish the setting.
Step 4: Add the local files for ontologies you want to import.
This step adds a set of directories from your local SVN repositories. This example uses the
/datamodel/ directory. You need to locate the SVN local copy on your machine.
1
Select File > Ontology Libraries form the menu.
2 Click + located in the top left corner of the dialog window. When prompted, navigate to your
local SVN directory and select:
/datamodel/trunk/src/ontology/imports
2
Protege
3. Click + icon on the top left corner of the dialog window. You will be prompted with a
finder/Explorer window. Navigate to your local SVN directory and select:
/datamodel/trunk/src/ontology/external
4. Click + located in the top left corner of the dialog window. When prompted, navigate to your
local SVN directory and select:
/datamodel/trunk/src/ontology/external/iao
5. Click + located in the top left corner of the dialog window. When prompted, navigate to your
local SVN directory and select:
/datamodel/trunk/src/ontology/external/iao/external
6. Click + located in the top left corner of the dialog window. When prompted, navigate to your
local SVN directory and select::
/datamodel/trunk/src/ontology/external/iao/protégé
Your dialog window should look like the following (show me):
3
Printed Documentation
7.
Close Protege.
8. Open Protege and open eaglei-app.owl
You are ready to edit.
Editing the eagle-i Ontology
Intro:
A unique feature of eagle-i is that the data collection and search tools are completely driven by
ontologies. These ontologies are a set of modules that are written in the OWL language and
edited and managed in Protege. The following guide will talk you through configuring Protégé,
editing the core ontology module, and adding annotations to drive the user interfaces.
4
Protege
[4:27:03 PM] Nicole Vasilevsky: Editing the eagle-i Ontology
The following steps assume that you have a basic understanding of Protégé.
There are seven basic steps when editing the ontology. Tasks to perform these steps are
described in detail below.
1) Notify the curators that you are locking ero.owl and eagle-i-app.owl.
2) Update the ontology files in SVN.
3) Modify the Ontology and add new terms to IDs.xlsx Excel spreadsheet.
4) Classify the Ontology.
5) Save the project.
6) Commit to SVN and write out commit message.
7) Send an e-mail to the curators to notify them that you've released the file.
Step 1
Notify curators
Step 2 Open SVN and update
open subversion
E-mail curators and include the name of all of the files that you a will be working with. For
example: [LOCKING] eagle-i-org-app.owl for editing.
Step 2: Update IDs.xls (Change)
update to: svn+ssh://orchestra.med.harvard.edu/svn/cbmi/u24/eagle-idev/datamodel/trunk/src/ontology
this will update 3 files: the ids.xls. and ero.owl.
eagle-i-app.owl
and all relevant imports (update entire trunk)
Open the IDS file
so maybe change text a little
to say "to add a new term" [4:39:17 PM] Melissa Haendel: because some will be edits and
some will be new
5
Printed Documentation
[4:39:20 PM] Melissa Haendel: and this is describing new
1. Find the IDS.xls file, typically located: SVN ->Data model -> src -> ontology -> IDs.xls
2. Enter the term and class in the following columns:
ERO ID # A
Mireot B
choose next
consecutive
number
nothing
Label C
Enter the name of
the term
Type D
URI E
type Class
choose next
consective
3.
Step 3: Modify the Ontology
1. Open Protege. Replace the default URI with http://purl.obolibrary.org/obo/ero.owl.
2. From the Active Ontology tab, click Classes. If necessary, use the Protégé search box,
located in the top, right corner of the screen to locate the parent term.
4. Make sure the correct Class is highlighted the, click Subclass to add a Subclass. When the
dialog appears, copy and paste the ID (column A) from the Excel spreadsheet into the text box:
“Create a new OWL Class”
5. Click OK. The ID appears as a label in the Annotations box.
6. Every annotation must be a datatype, String. To edit the annotation, double click and choose
String from the list.
7. Add the term, using all lowercase.
8. Click OK.
9. Chose a definition- make sure it is IAO (hover over name). Chose Type: STRING
10. If necessary, add an alternate term. Chose Type: STRING. Copy and paste the definition you
just selected. Make sure it is IAO (hover over name).
11. Enter your name in definition editor.
12. Click Save. Hover over new term and to confirm a valid URI.
Best practice is to make only a few changes before committing to SVN. Keep detailed notes of the
changes you make and add to commit message in SVN. And, always add a comment describing
the changes you've made when committing.
6
Protege
Select Reasoner > HermiT 1.3.2 from the menu.
Relabeling preferred terms
1. Open http://eagle-i-org/ont/data on the toolbar
2. Use Search to locate the class you want to re-label.
Note: Terms that are in bold are from the currently open ontology. Terms that are not bold, were
imported from another file (ontology).
3. Create new annotation value.
4. If it isn't already selected, select Constant.
5. Select the preferred eagle-i label.
6. Click OK.
7. Type the name/label (Use capitalization as appropriate)
8. Click OK.
9. Save
Step 6. Commit in SVN.
1
Refresh SVN.
2
Select CHANGES. Confirm that everything looks okay and there were no mistakes.
3
Select all red files and commit. Make sure to enter comments.
Step: 7 Notify the curators that the file has been unlocked.
Email team [UNLOCKING] ero.pprj
7
Printed Documentation
Obsoleting a Term
Before obsoleting a term it is best practice to check the usage of the terms across the repositories.
You can do this using either SPARQL queries or ECDT tools.
1. Open Protege.
2. To see if the term you want to obsolete is used as domain and range for some property or
some logical definition, select the term click "Usage" (show me). If it is, remember to replace
the obsolete class with the new class at the end of the process or get rid of properties and logical
definitions that will be obsoleted with the resource.
3. Rename rdfs:label as 'obsolete_old rdfs:label' (look at a pre-existing obsolete term for
example).
4. Add annotation property 'reason for obsolescence' and enter a reason why this term is being
obsoleted.
5. Add new parent term, 'OboinOWL:ObsoleteClass'. Click add named class. Search for the
obsolete parent.
6. To remove the old parent term in the ontology, select the old parent and then click remove
named class.
7. Look under the ObsoleteResource class and recheck the usage of the obsoleted class (#2
above) and fix any property or logical definitions that still include the obsoleted resources.
8. Remember to keep track of data using the obsoleted term, that will need to be migrated.
Provide a migration (Excel) file for the build team.
Step 4: Classify Ontology
1
8
Select Reasoning > Hermit 1.3.2. and then select Start Reasoner. (show me)
Protege
Committing to SVN
1)
Before committing make sure to run a diff and check for the changes.
2)
Write a detailed list of the changes that have been made in the commit message.
Annotation Properties
In the eagle-I app file we keep all the application specific properties and annotation.
The purpose of these annotation properties and their values is to "flag" classes and properties to
tell the application (for instance), which class is a Resource collected or which property is the
one connecting a resource to a lab.
Currently the following properties that can be used:
- inClassGroup
- inPropertyGroup
There is a specific set of values for both
inClassGroup and
inPropertyGroup
( see https://sites.google.com/a/eagle-i.org/workspace/build-team-workspace/data-modelintegration/applicationspecificannotations for most current values)
You can see all these values in the ontology by:
1)
Selecting the eagle-i-app-ont ontology as the native ontology in Protégé from the dropdown
of active ontology as shown below:
9
Printed Documentation
2)
Click on the individuals tab and check each single value (note that there is comments
explaining the usage and i the Types” indicates if value applies to classes or properties
In order to flag a class or properties with a particular value:
1) Make sure you are adding annotation in the eagle-i-app ontology (see below)
2)
Select the class or property form the class or property hierarchy that you want to annotate
3)
Select the annotation tab on the right panel and click on add annotation
10
Protege
4)
Select inClassGroup or inPropertyGroup from the list (depending what you are annotating)
5)
Select ‘Individuals’ and ‘Entity URI’ form the right tabs as shown below:
6)
Select the value of annotation from the list. And click OK.
For the specifics of the values and meaning of current annotation values please refer to:
11
Printed Documentation
https://sites.google.com/a/eagle-i.org/workspace/build-team-workspace/data-modelintegration/applicationspecificannotations
12
Extract, Transfer, Load (ETL)
Extract, Transform and Load (ETL)
Extract, transform and load (ETL) is a process that can be used to upload large data sets
into the eagle-i system. Because of the many and divergent priorities of both teams, ETL is
a work process that we must use judiciously and prioritize cooperatively. ETL requires the
time of both the curation and build teams, regardless of file size or type; it should be
reserved for large or complicated files that would be too time-consuming to enter manually.
Timeline

From the day when ETL work begins, ETL will take a minimum of 5 business
days to complete.

There is currently a backlog of files to ETL; files within this queue can be
prioritized to meet data contributor/lab needs.

Time to ETL a file depends on several factors, including the type of the
resources being ETLed (e.g. reagents, organisms), prior experience with that
resource type, and ability of the current ontology to capture that resource type.

When discussing ETL with a lab, expectations should be managed
appropriately to avoid disappointment or disengagement of the lab.
Pre-ETL Workflow
When a data contributor has a pledge for a file to ETL or the actual file to ETL, the software
implementation team should be contacted via e-mail and provided with:

Lab/PI name

Approximate size/number of records (if known)

File format (CSV exports, Excel files are common and workable)

Any deadlines or dependencies for ETLing this data

Date when the file might be available, if it is just a pledge

Whether the lab will be annotating or editing these records after ETL
If the file is urgent, the software implementation team will provide an estimate of ETL date
within 1-2 business days.
The software implementation team will contact the data contributor and others involved when the file
is ETLed and ready for the data contributor/lab, if the ETL date changes, or if any problems with the
data are encountered.
13
Printed Documentation
Curation Workflow
1. Make note of the following from the pre-ETL workflow:

Any deadlines or dependencies for ETLing the data

Whether the lab will be annotating or editing these records after ETL
2. Check files for appropriate data and required fields:

If data is missing, obtain it from the web, Resource Navigator, or another
resource.

Publications: enter the PubMed accession number (PMID; this can be found at
http://www.pubmed.com)

Enter the title of the publication.

Enter the PMID number only (do not write “PMID: ###”).

If new terms are needed, write JIRA tickets for term requests.
3. Ensure proper formatting:
Capitalize the first letter of words.
Capitalize the first letter of first word in resource description, end with period.
Move data to resource description if it does not match label in data tool
(determine if data should be included).
Migrate data into proper template (stored on SVN).
Multiple instances for the same field
If you have more than one instance for a field, separate them by semicolons.
Note: If you use semicolons to separate data in the excel spreadsheet, the data will go into separate
fields in the data tool.
For example, if you have two techniques, such as Western blot analysis and Northern blot
analysis, enter it in the Excel file as: Western blot analysis; Northern blot analysis
If you have multiple instances that correspond across several fields, annotate them as
shown in the example for construct inserts:
Gene/Insert symbol
14
Gene/Insert
Entrez Gene ID
Insert Species Origin
Extract, Transfer, Load (ETL)
EFT2 promoter; Gald4(1- 851993; 855828;
147); Pho4(61-99)
850594
Saccharomyces cerevisiae; Saccharomyces
cerevisiae; Saccharomyces cerevisiae
These data will be entered into the repository as:
1) Construct_Insert_1 Insert_Symbol EFT2 Promoter
Construct_Insert_1 Insert_Entrez Gene ID 851993
Construct_Insert_1 Insert Species Origin Saccharomyces cerevisiae
2) Construct_Insert_2 Insert_Symbol Gal4(1-147)
Construct_Insert_2 Insert_Entrez Gene ID
Construct_Insert_2 Insert Species Origin
855828
Saccharomyces cerevisiae
3) Construct_Insert_3 Insert_Symbol Pho4(61-99)
Construct_Insert_3 Insert_Entrez Gene ID
850594
Construct_Insert_3 Insert Species Origin Saccharomyces cerevisiae
Ready for ETL
1. Move the file to the “to-be-ETLd-in-draft-state” folder and move into appropriate subfolder (depending on
which institution it should be ETLd into).
2. Send an e-mail to the ETL working group.
Post-ETL
Open the eagle-i data tool to ensure the data is uploaded into the correct fields.
This is the list that should be used to confirm no dirty laundry is showing on our instance pages.

Currently (0.6.4 ontology) marked in the AdminData property Group (see
here for more description of property groups)
- curator_notes
- lab_data_format
- lab_inventory_system

Privacy sensitive data shouldn't be shown in the dissemination page and
these are the ones that should be marked with a PropertyGroup_ContactLocation.
These properties are the following:
- contact_email
- lab_delivery_address
- mailing_address
- phone_number
15
Printed Documentation

rdfs:comments should also not be shown on instance pages (as per recent
bug http://jira.eagle-i.net:8080/browse/DTOOLS-635)

Do not publish the data before checking with the data contributor and
software implementation team, as the lab may want to go back and validate the
data.

Once the data has been validated by contributor and lab, the data can be
published. For really large files, the resources can be published in bulk by the
software implementation team upon request.
Contact the software implementation team if you have any questions about the process or
the status of a particular file.
16
Catalyst
The Catalyst System at Harvard Core Laboratories
The eagle-i repository now supplies all of the core facility data that is displayed in the Harvard Catalyst
(http://cbmi.catalyst.harvard.edu/cores/index.html) database of core facilities at Harvard University.
This includes core names, descriptions, addresses, affiliates, websites, and contacts. Catalyst also
displays information about any instruments, software and services attached to each core.
Curators should follow the same basic guidelines for Harvard cores as they would with any other core
facility in another institution. But they should also be aware of the following Harvard-specific
guidelines in order for the information to be compliant with Catalyst:

Organization Type: All organizations that should be included in the Catalyst system need to
have the type “Core Laboratory.” This is because the Catalyst fields have been mapped to the
fields in the eagle-i core laboratory resource form.

Website(s): the URL of the Catalyst record for a particular core should never be included in
the eagle-i record’s “Website(s)” field. This is to prevent the Catalyst record from displaying a
link to itself.

Lab address: the Catalyst location field is populated by the eagle-i “Mailing Address” field.
Curators should ensure that if there is an address for the lab, it is contained in that field. The
“Lab Delivery Address” field can also be filled in, but any different information there will not be
displayed in Catalyst.

Institutional affiliations: the Catalyst institutions field is populated by the eagle-i primary
“Affiliation” field. This field should only contain organizations from the following controlled list
of approved Harvard institutions:
o
Beth Israel Deaconess Medical Center
o
Brigham and Women's Hospital
o
Broad Institute
o
Children's Hospital Boston
o
Dana-Farber Cancer Institute
o
Dana-Farber/Harvard Cancer Center
o
Forsyth Institute
o
Harvard Faculty of Arts and Sciences
o
Harvard Medical School
o
Harvard Pilgrim Healthcare
17
Printed Documentation
o
Harvard School of Public Health
o
Harvard Stem Cell Institute
o
Immune Disease Institute
o
Joslin Diabetes Center
o
Massachusetts General Hospital
o
Partners HealthCare Center for Personalized Genetic Medicine
o
Schepens Eye Research Institute
Any other institutional affiliations should be populated only in the Secondary affiliation field, and
will not be displayed in Catalyst.

Naming conventions: to distinguish multiple labs with the same or similar names, all
Harvard core laboratories should include the name or abbreviation of their primary affiliate. If
the affiliate is already a part of the official name, then nothing further is required. If it isn’t,
the abbreviation should be appended to the end of the name in parenthesis, i.e. Name of Lab
(Primary affiliate abbreviation).
For example...
Flow Cytometry Core Laboratory (BWH)
BWH Transgenic Mouse Facility
Click the Institutions link on the left sidebar on Catalyst to see a list of the current institutions and
abbreviations used.
In addition, there are some technical steps that need to be taken to keep the Catalyst interface up to
date:
1. Before publishing a new core laboratory in the Harvard repository, eagle-i curators will need
to contact the appropriate person at Catalyst so that he or she can assign a category for the core.
For mapping purposes, eagle-i will need to provide the name of the core exactly as it appears in
the eagle-i record as well as the eagle-i URI.
2. Any changes to the name on the eagle-i record will need to be communicated to the
appropriate contact at Catalyst so that it can be manually updated in the Catalyst sidebar. (Note:
this is a temporary workaround until the system can be synchronized.)
3. Before withdrawing or deleting an eagle-i core record, curators should let the appropriate
Catalyst contact know so the core can also be withdrawn from the sidebar listing of core
facilities.
18
Catalyst
4. The eagle-i ontology team needs to collaborate with the Catalyst team when making any
changes to the ontology which will affect fields that are mapped to the Catalyst records.
Copyright © 2011 The eagle-i Consortium. All rights reserved.
19