Ontology Quality by Detection of Conflicts in Metadata Budak I. Arpinar Boanerges Aleman-Meza

advertisement
Ontology Quality by Detection of
Conflicts in Metadata
Budak I. Arpinar
Karthikeyan Giriloganathan
Boanerges Aleman-Meza
LSDIS lab
Computer Science
University of Georgia, USA
EON’2006
Edinburgh, Scotland, May 22, 2006
Co-located with WWW-2006
Motivation
• Ontologies over 1 million entities increasingly
appearing
• TAP, SWETO, GlycO, UniProt
• Quality Concerns:
–
–
–
–
Entity disambiguation
Which ontologies are available? (i.e., search & ranking)
Inconsistency checking (i.e., in OWL)
Conflict detection
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
… Motivation
• “Representing, identifying, discovering, validating, and exploiting
complex relationships are important issues related to realizing the
full power of the Semantic Web, and can help close the gap
between highly separated information retrieval and decisionmaking steps” [Sheth, Arpinar & Kashyap 2003]
• “The Web is decentralized, allowing anyone to say anything. As a
result, different viewpoints may be contradictory, or even false
information may be provided. In order to prevent agents from
combining incompatible data or from taking consistent data and
evolving it into an inconsistent state, it is important that
inconsistencies can be detected automatically” [W3C 2004]
• “… these problems manifest themselves in various ways, including
poor recall of available resources and inconsistency of search
results. They arise due to errors, omissions and ambiguities in the
metadata…” [Currier & Barton 2003]
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Our Approach
• Approach: Detection of conflicting relationships
– or conflicts in sequences of relationships
• How? User-defined rules are validated against a
populated ontology
– These rules are domain-dependent
• Goal: By detecting conflicting data, a user can take
action to improve the quality of the ontology
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Example of Conflict Identification
John
fatherOf
marriedTo
fatherOf
Claura
fatherinLawOf Mary
motherOf
marriedTo
Bill
CONFLICT
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Few definitions, ‘simplification’
• An RDF triple is a simplification
• Basically, composing relationships
– Leading to simple relations yet somewhat arbitrary
Chris
votedFor
supporterOf memberOf
Williams
RepublicanParty
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Statement Simplification
•
There could be simplifications of the form:
statement1  statement2  … statementn → statementt
•
In this case statementt is a simplification
–
–
this is dependent on expert knowledge
this is not in the traditional reasoning approach
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Statement Simplification
suspected
MoneyLaundering
associated
Immigrant
multipleDeposits
FinancialOrganization
Immigrant
owner
BusinessOrganization
works
underInvestigation
Person
JudicialOrganization
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Using ‘simplification’ for detection
of conflict
 Two sets of triples T1 and T2 are in conflict if their
simplifications S(T1)s1 and S(T2)s2 are mutually nonagreeable
 Two simplifications s1 and s2 are mutually non-agreeable if
taken together they are in violation of domain constrains
T
A set of triples
S
A function denoting the process of simplification
s
The result of simplification (S(T)s)
U
Constraints expressed in an ontology (e.g., the property
‘biologicalMother’ is unique)
E
Constraints supplied by an expert (e.g., person(x) can never
do action(y))
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Defining Rules for Simplification
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Types of Conflicts
• Property Assertion
• Class Assertion
• Statement Assertion
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Types of conflicts: Property Assertion
 Establish constraints on
properties
- based on the
semantics of their
intended/expected use
- thus, subjective
 Examples:
‘asymmetric’ constraint
‘disjoint’ constraint
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Types of Conflicts: Class Assertion
 Establish constraints on
classes
- based on the semantics
of their intended/expected
use
- also, subjective
 Examples:
 ‘disjoint’ classes
(schema or instances)
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Types of Conflicts: Statement Assertion
• Stating that under
certain conditions, one
or more statement are
conflicting
• Example, a person
cannot be a superior
and a friend to “John”
at the same time
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
System Architecture
MANDARAX API
CONFLICT ENGINE
RULES
Facts
RuleML
MANDARAX API
Relationship
Ontology
SIMPLIFICATION
RULES
RuleML
User Interface
CONFIDER API
SERIALIZER
JENA API
Ontology
Semantic Metadata
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Performance Evaluation
• Tested with an ontology of 6K entities and
11K relationships
– subset of SWETO ontology
– domain of computer science publications
• Sample conflict detection of:
– no two same papers published in different
publication venues
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Conflict Identification Results
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Statement Provenance
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Performance Evaluation
Triples vs Time
4.651849588
Log ( Time in milliseconds)
6
5
4
7.049989218
7.075019174
7.030853263
6.89743981
6.875508877
6.808724663
6.730027336
5.955244091
7
6.508925185
8
3
2
1
0
0
200
400
600
800
1000
1200
No of Triples
 with increase in number of conflicts (500 triples)
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Conclusions and Discussion
• Defined types of conflicts
• Described a rule-based approach to identify
the conflicts
Findings:
• Scalability limited by other tools (Mandarax)
• Applicable to refining extraction-based
approaches for populating ontologies
• Very domain-dependent and subjective
method
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Comments, Questions, …
Searching and Ranking Documents based on Semantic Relationships, Boanerges Aleman-Meza, ICDE Ph.D. Workshop 2006
Download