Toward Using Ontologies to Reason About Disagreeing Taxonomic Experts Dave Thau

advertisement
Toward Using Ontologies to
Reason About Disagreeing
Taxonomic Experts
Dave Thau
UC Davis
thau@learningsite.com
Why Did The Chicken Cross The
Road?
• To get to the other side.
Zeno of Elea
• To boldly go where no chicken has gone before.
• To prove it could never reach the other side.
• Chickens, over great periods of time, have been
naturally selected so that they are now
predisposed to cross roads.
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
2/25
Why did the
taxonomists
cross the
road?
So they
could
properly
identify the
chicken
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
3/25
Overview
• Quick primer on taxonomy
• Some types of disagreements between
experts
• Problems this causes
• Using an ontology to represent taxonomic
opinions
• Using the ontology to compare experts’
theories
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
4/25
Linnaean Taxonomy Basics
Ranks: kingdom, phylum, class, order, family,
genus, species, variety (and others!)
Canidae
Family Rank
Genus Rank
Species Rank
thau@learningsite.com
Vulpes
Canis lupus
Canis
Nyctereutes
Canis latrans
Canis familiaris
NeSC RDF Workshop June 8, 2006
5/25
Things you may not know
• There is no big list of all the known species in
the world
• This is partly because people don’t agree on the
definitions of the species, genera, etc.
• Estimates are that 6% of the known taxa are
changed every year
• This has been going on since Linnaeus
published his classification scheme in 1735
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
6/25
Types of Disagreement: The Basics
Benson, 1948
FNA-03, 1997
 
Ranunculus
aquatilis
R.a. var
calvescens
R.a. var
capillaceus
Ranunculus
aquatilis
R.a. var
aquatilis
R.a. var
diffusus
R.a. var
hispidulus


A
B
A
B
B
A
B
A
512
This results in
(more than 240 million) possible sets
Aof relationships.
B
A  B
A overlap B
A disjoint B
A  B
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
7/25
Types of Disagreement Splitting and Lumping
Kartesz, 2004
Benson, 1948
Ranunculus
flammula
R.f. var
filiformis
R.f. var
genuiinus

Ranunculus
flammula
R.f. var
ovalis
R.f. var
filiformis
R.f. var
flammula



Peet, 2005:
B.1948:R.flammula is congruent to K.2004:R.flammula
B.1948:R.f. genuiinus is included in K.2004:R.f.flammula
B.1948:R.f.ovalis is included in K.2004:R.flammula
B.1948:R.f.filifomis is congruent to K.2004:R.f.filiformis
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
8/25
Types of Disagreement –
Differing Extents
Benson, 1948
Kartesz, 2004

Ranunculus
glaberrimus
R.g. var
reconditus
R.g. var
ellipticus
R.g. var
typicus

Ranunculus
glaberrimus
R.g. var
ellipticus
R.g. var
glaberrimus

Peet, 2005:
B.1948:R. glaberriums contains K.2004:R. glaberrimus
B.1948:R.g.ellipticus is congruent to K.2004:R.g.ellipticus
B.1948:R.g.typicus is congruent to K.2004:R.h.blaberrimus
B.1948:R.g.reconditus is congruent to K.2004:R.tritenatus
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
9/25
Impact on Data Analysis
• Can’t find data
– If A  B, a search on A should retrieve B
• Can’t aggregate data
– If B  A, you should be able to combine data
from B into A
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
10/25
What to do in case of conflicting
experts?
• Just listen to one expert you like
• Pick an expert you like and everyone who
agrees with this expert (and each other)
• Choose experts who form the largest set
of agreeing experts
• Choose experts whose opinions
encompass the smallest or largest number
of taxa
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
11/25
How can we find out which experts
agree?
• Represent taxonomy using logic
• Use the logic to determine relations
between expert opinions (theories)
– Two theories may conflict
– Two theories may be equivalent
– One theory may encompass another
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
12/25
Representation Details
• Based on the Taxon Concept Schema
(TCS)
• Represented using Description Logic
–(OWL DL)
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
13/25
Example Ontology
Taxon
Things in
the species
Ranunculus
glaberrimus
Taxon Description
hasSpecies
hasGenus
Things in
the genus
Ranunculus
thau@learningsite.com
Specimen
Ranunculus glaberrimus
(Kartesz, 2004)
Ranunculus (Kartesz, 2004)
NeSC RDF Workshop June 8, 2006
14/25
Fundamental Assumptions
• Each Taxa class has at least one
instance
• Each Taxa class is defined as the union
of its subclasses
• A class’s subclasses are defined to be
mutually disjoint
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
15/25
Questions Ontology Can Answer
•
•
•
•
Find the subclasses of a class
Make sure the taxonomy is consistent
See if two classes are equivalent
Can also use it to compare expert opinions
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
16/25
Compatible Theories
• A theory is one expert’s set of classes and
relations and all they imply.
• A set of theories is compatible if
– Each theory is consistent and
– The correspondences between classes in the
theories do not cause inconsistency.
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
17/25
Example Incompatibility
Benson, 1948
Kartesz, 2004

Ranunculus
hydrocharoides
R.h. var
natans
R.h. var
stolonifer
R.h. var
typicus

Ranunculus
hydrocharoides
R.h. var
stolonife
r
R.h. var
typicus

Peet, 2005:
B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer
B.1948:R.h.typicus is congruent to K.2004:R.h.typicus
B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
18/25
Example Incompatibility
Benson, 1948
Kartesz, 2004
Ranunculus
Ranunculus
macranthus
Ranunculus
petiolaris
Ranunculus
…
Ranunculus
petiolaris
…


B.48:R. petiolaris  K.04:R. petiolaris  B.48:R. macranthus contradicts
B.48:R. macranthus and B.48:R. petiolaris are disjoint.
Peet, 2005:
B.1948:R. macranthus contains K.2004: R. petiolaris
B.1948:R. petiolaris is contained by K. petiolaris
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
19/25
Inferring Unstated Correspondences
Benson, 1948
Kartesz, 2004

Ranunculus
arizonicus
R.a. var
chihuahua
R.a. var
typicus
Ranunculus
arizonicus


Peet, 2005:
B.1948:R.a.typicus is included in K.2004:R. arizonicus
B.1948:R. arizonicus is congruent to K.2004:R. arizonicus
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
20/25
Comparing Theories
• Given two compatible theories, T and T’:
– The theories are congruent if each class in
theory T is equivalent to one class in T’ (and
vice versa).
– T is smaller than ( ) T’ if each class in T
either equals or is contained by a class in T’.
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
21/25
Example of Theory Ordering
T1
T2
A
B
A

C
D

B


T1
thau@learningsite.com
T3
C
B

T2
A

C
E

T3
NeSC RDF Workshop June 8, 2006
22/25
Whom to believe?
• Just listen to one expert you like
– Easy! Don’t need any reasoning
• Pick an expert you like and everyone who can agree with
this expert
– Choose all experts with theories equivalent to the expert you like
• Choose experts who form the largest set of agreeing
experts
– Find largest equivalence class
• Choose experts whose opinions form the smallest or
largest number of taxa
– Bigger theories account for more taxa
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
23/25
Future Work
• Vetting the ontology
• Adding ‘intelligence’ to tools which build
correspondences
• Implementing authority picker in a
workflow system
• Efficient algorithm for determining theory
hierarchy
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
24/25
Thanks! Questions?
• I’d like to acknowledge:
– Bertram Ludäscher, Shawn Bowers, Serguei Krivov,
Richard Waldinger for many discussions on this topic.
– Jessie Kennedy, Robert Kukla, Trevor Patterson,
Martin Graham for their work on the Taxon Concept
Schema
– Bob Peet for the Ranunculus data set
– Kirsten Menger-Anderson for Chicken Drawing
– NSF, under SEEK awards 0225676, 0225665,
0225635, and 0533368
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
25/25
Where In Greece Can I Find Ranunculus aquatilis?
R. aquatilis
thau@learningsite.com

R. trichophyllus
NeSC RDF Workshop June 8, 2006
26/25
Beginnings of Biological Taxonomy
• Egypt, 1500 BC: Ebers medical papyrus,
classification of medical plants
• Greece, 300 BC: Aristotle and
Theophrastus
• China, 200 BC: Erh-ya dictionary (second
century BC)
thau@learningsite.com
NeSC RDF Workshop June 8, 2006
27/25
Download