class_9_ontologies_101308

advertisement
Ontology creation in Protégé
Overview ................................................................................................................................................................. 1
Exercise 1 – Protégé Overview ............................................................................................................................... 1
Exercise 2 – Data modeling for Ontology creation ................................................................................................ 2
Exercise 3 – Designing a thesaurus/ontology ......................................................................................................... 3
Overview
In this set of exercises, we become familiar with designing and implementing controlled vocabulary systems
including word lists, thesauri, taxonomies, and ontologies. Our first exercise will include an overview of
Protégé operations. Our second exercise will model Dublin Core records. Our third exercise will engage in
thesaurus design and replication.
Exercise 1 – Protégé Overview
This introduction covers basic file creation and data population in Protégé. Subsequent exercises cover design
concepts.
Definitions:
1.
Project
a. The ‘file’ which contains all of the content & structure of your ontology
2. Class
a. A high level organizational structure that allows you to show hierarchical relationships
i. Concrete – A structure that can have direct “instances” of an object described underneath
them
ii. Abstract – A structure that is a ‘pure’ concept. An abstract class can only have other
child/sibling classes
3. Slot
a. A descriptive facet. Slots are in essence the nuts & bolts of data modeling in Protégé. Slots are used
to describe ‘instances’
4. Instances
a. A concrete object or concept. Instances are described by the slots that have been created and
assigned to them.
Procedures
1. To create a new project
a. Open Protégé, Click on File >> New Project, Enter a name for your project
2. To create a class
a. With the “Thing” class selected, click on the “create class icon or right click in the class browser
window
b. Name your class
c. To create child classes of an object, right click on the class which you would like to create a child
class for and choose ‘create class’
3. Create slots
a. From within a class right click in the template slots pane and select “create slot”
b. A slot entry window pops up.
c. Enter a name, select a datatype (string) and set other rules (multiple, cardinality, etc).
d. To create a slot which refers to another instance (Related Term, Broader Term, etc), change value
type to instance, and under the “allowed classes” box, select the class which will contain the
instances you want to link to
4. Assigning slots to classes
a. You can do this in two ways: First, you can choose your class and then follow the slot creation
process above. If you have already created a slot, you can choose “Add slot” from the right click
menu
b. It is preferable to re-use slots as much as possible (for example, define one ‘name’ slot at a high
level & re-use it. This allows you to have a cleaner and more interoperable ontology
5. Create instances
a. Go to the instances tab
b. Select the class that you want to create an instance for & click on the create instance icon
c. Enter in information for your instance
d. Setting the display slot for your instances
e.
Exercise 2 – Data modeling for Ontology creation
In this exercise, we will use Protégé to create a set of Dublin Core records. While this is technically not an
ontology or a taxonomy, it does show how flexible a system Protégé is.
1. Create your DC data model and populate it with a few records. You may need to reference the
procedures documented in exercise 1.
a. Create a top level class called “Dublin Core”
b. Create a child class of Dublin Core called “record”
c. Create slots for record that correspond to your DC elements
i. Title, creation, identifier, etc
d. Create a few instances of DC records (catalog a few websites for example)
2. Now that we have a basic data model, lets define some subjects to use in our DC modeling system
a. Go back to your class tab and add a new top level class called “subjects”
i. Add a child class called “local” to which we will add some local subject headings
ii. Add a slot to the subject parent class called “name”. This will hold the name of our
subject heading
b. Go to your instances tab and create a few subjects related to the websites you just cataloged
3. Now that we have our subjects class/instances created, lets add a subject slot to our DC record class
a. Go to our Dublin Core record class and create a new slot called “subjects”
i. For value type, choose “instance.” This will allow us to enter subjects that we have
previously entered into our subjects area
ii. Under allowed classes, choose the subjects class.
4. With our updated data model lets go back and edit our DC records. Add some of the subjects that we
just entered into our records
5. Now that we have some data entered, perform a query or two on your database.
a. Click on the Queries tab, choose a class, a slot, and a value to search for:
b.
Exercise 3 – Designing a thesaurus/ontology
1. Form into small groups (3-5 people). Visit the Boxes & Arrows Glosso Thesaurus http://www.boxesandarrows.com/view/controlled_vocabularies_a_glosso_thesaurus.
2. Look at the data there and come up with a structure in Protégé that allows replication of the thesaurus.
Think about how you would model this data set in protégé. Would you have one class called “terms”
with lots of instances? Would you have lots of hierarchical classes?
3. Some issues to consider are:
a. Do you want terms to be classes or instances?
b. What do you need to represent in hierarchies (as classes) and what do you need to represent in
description (slots)?
c. What is the easiest way to show the relationships (broader term, narrower term, etc)?
d. Do you need to allow multiple relationships for a given type (BT, RT, etc)?
e. If you have multiple classes, at what level should you create the slots?
4. Using the whiteboard next to your group – write out a model for your thesaurus representation. Model
at least the classes/slots and write up one example of an instance. Note – in order to create broader
term/narrower term relationships, you may have to jump around the thesaurus a bit.
5. If you have time – create your structure in Protégé
a. Decide on a data structure (classes, slots, instances)
b. Implement your framework (classes & slots)
c. Complete data entry (instance creation)
6. Some possible techniques/concepts to keep in mind:
a. Data modeling process
i. Bottom up/top down design, Card sorting, Data diagramming
b. Tools available
i. Classes – A high level concept. Concepts can be concrete (an actual object with slot
values) or abstract (a conceptual entity with only child concepts.
ii. Slots – A descriptive facet or quality of a concept (class) (Name, acidity, URL)
iii. Instances – The most specific information in a taxonomy or ontology – the full record of
the “thing”
c. Concepts
i. Parsimony – be exact and conservative in your use of terms/qualifiers/descriptors – do
not combine concepts into a single class/slot
ii. Concept inheritance/transitivity or“IS-A”/“Kind-of” relationships – make sure that when
you are creating hierarchies your child concepts are a “kind of” your parent concepts.
Make sure that each child object can inherit the class of its parent and grandparent
iii. Plural vs. Singular names – Choose one or the other
iv. Capitalization – Be consistent!
v. Design your ontology by analyzing concepts not terms – Synonyms for a concept (shrimp,
prawn, crevette) represent a single class concept.
vi. Common ground – at each level of a taxonomy/hierarchy the concepts/classes should be
of the same type (they should all be siblings of each other)
vii. Specificity – Have you addressed the Concept down to the most appropriate level?
viii. Exhaustivity – have you described all of the concepts in your Ontology? Are there
additional properties to be described? Are there additional relationships to reprsent
ix. Multiple Inheritance – A class can be a sub-class of multiple parents
x. Fixity – classes, once defined, should be fairly static
Download