Toward Using Ontologies to Reason About Disagreeing Taxonomic Experts Dave Thau UC Davis thau@learningsite.com Why Did The Chicken Cross The Road? • To get to the other side. Zeno of Elea • To boldly go where no chicken has gone before. • To prove it could never reach the other side. • Chickens, over great periods of time, have been naturally selected so that they are now predisposed to cross roads. thau@learningsite.com NeSC RDF Workshop June 8, 2006 2/25 Why did the taxonomists cross the road? So they could properly identify the chicken thau@learningsite.com NeSC RDF Workshop June 8, 2006 3/25 Overview • Quick primer on taxonomy • Some types of disagreements between experts • Problems this causes • Using an ontology to represent taxonomic opinions • Using the ontology to compare experts’ theories thau@learningsite.com NeSC RDF Workshop June 8, 2006 4/25 Linnaean Taxonomy Basics Ranks: kingdom, phylum, class, order, family, genus, species, variety (and others!) Canidae Family Rank Genus Rank Species Rank thau@learningsite.com Vulpes Canis lupus Canis Nyctereutes Canis latrans Canis familiaris NeSC RDF Workshop June 8, 2006 5/25 Things you may not know • There is no big list of all the known species in the world • This is partly because people don’t agree on the definitions of the species, genera, etc. • Estimates are that 6% of the known taxa are changed every year • This has been going on since Linnaeus published his classification scheme in 1735 thau@learningsite.com NeSC RDF Workshop June 8, 2006 6/25 Types of Disagreement: The Basics Benson, 1948 FNA-03, 1997 Ranunculus aquatilis R.a. var calvescens R.a. var capillaceus Ranunculus aquatilis R.a. var aquatilis R.a. var diffusus R.a. var hispidulus A B A B B A B A 512 This results in (more than 240 million) possible sets Aof relationships. B A B A overlap B A disjoint B A B thau@learningsite.com NeSC RDF Workshop June 8, 2006 7/25 Types of Disagreement Splitting and Lumping Kartesz, 2004 Benson, 1948 Ranunculus flammula R.f. var filiformis R.f. var genuiinus Ranunculus flammula R.f. var ovalis R.f. var filiformis R.f. var flammula Peet, 2005: B.1948:R.flammula is congruent to K.2004:R.flammula B.1948:R.f. genuiinus is included in K.2004:R.f.flammula B.1948:R.f.ovalis is included in K.2004:R.flammula B.1948:R.f.filifomis is congruent to K.2004:R.f.filiformis thau@learningsite.com NeSC RDF Workshop June 8, 2006 8/25 Types of Disagreement – Differing Extents Benson, 1948 Kartesz, 2004 Ranunculus glaberrimus R.g. var reconditus R.g. var ellipticus R.g. var typicus Ranunculus glaberrimus R.g. var ellipticus R.g. var glaberrimus Peet, 2005: B.1948:R. glaberriums contains K.2004:R. glaberrimus B.1948:R.g.ellipticus is congruent to K.2004:R.g.ellipticus B.1948:R.g.typicus is congruent to K.2004:R.h.blaberrimus B.1948:R.g.reconditus is congruent to K.2004:R.tritenatus thau@learningsite.com NeSC RDF Workshop June 8, 2006 9/25 Impact on Data Analysis • Can’t find data – If A B, a search on A should retrieve B • Can’t aggregate data – If B A, you should be able to combine data from B into A thau@learningsite.com NeSC RDF Workshop June 8, 2006 10/25 What to do in case of conflicting experts? • Just listen to one expert you like • Pick an expert you like and everyone who agrees with this expert (and each other) • Choose experts who form the largest set of agreeing experts • Choose experts whose opinions encompass the smallest or largest number of taxa thau@learningsite.com NeSC RDF Workshop June 8, 2006 11/25 How can we find out which experts agree? • Represent taxonomy using logic • Use the logic to determine relations between expert opinions (theories) – Two theories may conflict – Two theories may be equivalent – One theory may encompass another thau@learningsite.com NeSC RDF Workshop June 8, 2006 12/25 Representation Details • Based on the Taxon Concept Schema (TCS) • Represented using Description Logic –(OWL DL) thau@learningsite.com NeSC RDF Workshop June 8, 2006 13/25 Example Ontology Taxon Things in the species Ranunculus glaberrimus Taxon Description hasSpecies hasGenus Things in the genus Ranunculus thau@learningsite.com Specimen Ranunculus glaberrimus (Kartesz, 2004) Ranunculus (Kartesz, 2004) NeSC RDF Workshop June 8, 2006 14/25 Fundamental Assumptions • Each Taxa class has at least one instance • Each Taxa class is defined as the union of its subclasses • A class’s subclasses are defined to be mutually disjoint thau@learningsite.com NeSC RDF Workshop June 8, 2006 15/25 Questions Ontology Can Answer • • • • Find the subclasses of a class Make sure the taxonomy is consistent See if two classes are equivalent Can also use it to compare expert opinions thau@learningsite.com NeSC RDF Workshop June 8, 2006 16/25 Compatible Theories • A theory is one expert’s set of classes and relations and all they imply. • A set of theories is compatible if – Each theory is consistent and – The correspondences between classes in the theories do not cause inconsistency. thau@learningsite.com NeSC RDF Workshop June 8, 2006 17/25 Example Incompatibility Benson, 1948 Kartesz, 2004 Ranunculus hydrocharoides R.h. var natans R.h. var stolonifer R.h. var typicus Ranunculus hydrocharoides R.h. var stolonife r R.h. var typicus Peet, 2005: B.1948:R.h.stolonifer is congruent to K.2004:R.h.stolonifer B.1948:R.h.typicus is congruent to K.2004:R.h.typicus B.1948:R. hydrocharoides is congruent to K.2004:R. hydrocharoides thau@learningsite.com NeSC RDF Workshop June 8, 2006 18/25 Example Incompatibility Benson, 1948 Kartesz, 2004 Ranunculus Ranunculus macranthus Ranunculus petiolaris Ranunculus … Ranunculus petiolaris … B.48:R. petiolaris K.04:R. petiolaris B.48:R. macranthus contradicts B.48:R. macranthus and B.48:R. petiolaris are disjoint. Peet, 2005: B.1948:R. macranthus contains K.2004: R. petiolaris B.1948:R. petiolaris is contained by K. petiolaris thau@learningsite.com NeSC RDF Workshop June 8, 2006 19/25 Inferring Unstated Correspondences Benson, 1948 Kartesz, 2004 Ranunculus arizonicus R.a. var chihuahua R.a. var typicus Ranunculus arizonicus Peet, 2005: B.1948:R.a.typicus is included in K.2004:R. arizonicus B.1948:R. arizonicus is congruent to K.2004:R. arizonicus thau@learningsite.com NeSC RDF Workshop June 8, 2006 20/25 Comparing Theories • Given two compatible theories, T and T’: – The theories are congruent if each class in theory T is equivalent to one class in T’ (and vice versa). – T is smaller than ( ) T’ if each class in T either equals or is contained by a class in T’. thau@learningsite.com NeSC RDF Workshop June 8, 2006 21/25 Example of Theory Ordering T1 T2 A B A C D B T1 thau@learningsite.com T3 C B T2 A C E T3 NeSC RDF Workshop June 8, 2006 22/25 Whom to believe? • Just listen to one expert you like – Easy! Don’t need any reasoning • Pick an expert you like and everyone who can agree with this expert – Choose all experts with theories equivalent to the expert you like • Choose experts who form the largest set of agreeing experts – Find largest equivalence class • Choose experts whose opinions form the smallest or largest number of taxa – Bigger theories account for more taxa thau@learningsite.com NeSC RDF Workshop June 8, 2006 23/25 Future Work • Vetting the ontology • Adding ‘intelligence’ to tools which build correspondences • Implementing authority picker in a workflow system • Efficient algorithm for determining theory hierarchy thau@learningsite.com NeSC RDF Workshop June 8, 2006 24/25 Thanks! Questions? • I’d like to acknowledge: – Bertram Ludäscher, Shawn Bowers, Serguei Krivov, Richard Waldinger for many discussions on this topic. – Jessie Kennedy, Robert Kukla, Trevor Patterson, Martin Graham for their work on the Taxon Concept Schema – Bob Peet for the Ranunculus data set – Kirsten Menger-Anderson for Chicken Drawing – NSF, under SEEK awards 0225676, 0225665, 0225635, and 0533368 thau@learningsite.com NeSC RDF Workshop June 8, 2006 25/25 Where In Greece Can I Find Ranunculus aquatilis? R. aquatilis thau@learningsite.com R. trichophyllus NeSC RDF Workshop June 8, 2006 26/25 Beginnings of Biological Taxonomy • Egypt, 1500 BC: Ebers medical papyrus, classification of medical plants • Greece, 300 BC: Aristotle and Theophrastus • China, 200 BC: Erh-ya dictionary (second century BC) thau@learningsite.com NeSC RDF Workshop June 8, 2006 27/25