Social Tags and Linked Data for Ontology Development

advertisement
Social Tags and Linked Data for
Ontology Development:
A Case Study in the Financial
Domain
Andrés García-Silva†, Leyla Jael García-Castro±,
Alexander García*, Oscar Corcho†
†{hgarcia,
ocorcho}@fi.upm.es
Ontology Engineering Group
Universidad Politécnica de Madrid, Spain
± leylajael@gmail.com
Universitat Jaume I, Castellón
de la Plana, Spain
*alexgarciac@gmail.com
State University, Florida, USA
June 2014
Introduction
Folksonomies
Web 2.0
Usergenerated
Content
Social
Networks
Tools for organizing,
sharing & discovering
Information
Tagging Systems
Tutorial
Java
Programming
language
Java Persistent
Access
Database
Knowledge
Base
Folksonomy
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
2
Introduction
Folksonomies
Folksonomies as a source of knowledge
• Vocabulary emerges around resources and users
Golder and Huberman (2006), Marlow et al. (2006)
• Maintained by a large user community
• Flexible (No restricted)
• Up-to-date
• Emergent semantics from the aggregation of
individual classifications
Gruber (2007), Mika (2007), Specia and Motta (2007)
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
3
State of the art
Folksonomies
Statistical-based
Ontology-based
Ontology
Generation
Tag Similarity
Measures
Two tags are related if..
relation?
Cattuto et al. (2008)
Markines et al. (2009)
Körner et al. (2010)
Benz et al. (2011)
Heymann and Garcia-Molina. (2006)
Begelman et al. (2006)
Hamasaki et al. (2007)
Jäschke et al. (2008)
Kennedy et al. (2007)
Mika (2007)
Benz et al. (2010)
Limpens et al. (2010)
Ontology
Folksonomy
Angeletou et al. (2008)
Cantador et al. (2008)
García-Silva et al. (2009)
Maala et al. (2008)
Passant (2007)
Tesconi et al. (2008))
Ontology
Hybrid approaches
Ontology
Giannakidou et al. (2008)
Specia and Motta (2007).
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
4
State of the art
Folksonomies
Approach
Type
Auto
Dat Src.
Mika, 2007
Hamasaki et al., 2007
Jaschke et al., 2008
Limpens et al., 2010
Begelman et al., 2006
Kennedy et al., 2007
Heyman & Garcia Molina, 2006
Benz et al., 2010
Giannakidou et al., 2008
Specia & Motta, 2007
Angeletou et al., 2008
Cantador et al., 2008
Tesconi et al., 2008
Passant, 2007
Maala et al., 2008
Stat
Stat
Stat
Stat
Stat
Stat
Stat
Stat
Hyb
Hyb
Ont
Ont
Ont
Ont
Ont
Yes
Yes
Yes
Semi
Yes
Yes
Yes
Yes
Yes
Semi
Yes
Yes
Yes
No
Yes
Del,Oth
Pol
Del,Bib
Oth
Del,Raw
Fli
Del,Cit
Del
Fli
Del,Fli
Fli
Fli,Del
Del
Oth
Fli
Select. &
Cleaning
Yes
No
Yes
No
Yes
Yes
No
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Context
Ident.
Yes
Yes
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Disambiguation
No
Yes
No
Yes
No
Yes
No
Yes
Yes
Yes
Yes
No
Yes
Yes
No
Sem.
Ident
Yes
No
No
Yes
No
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
Yes
Output
Evaluation
Onto
Onto
Hier
Enri
Clus
Inst
Hier
Hier
Clus
Onto
Enri
Inst
Enri
Enri
Enri
Desc Study
Task-based
Desc Study
Pres/Rec
Desc Study
Pres/Rec
Task-based
Pres/Rec
No
Desc Study
Pres/Rec
Pres/Rec
Pres/Rec
Desc Study
Desc Study
Domain
Knowledge
No
No
No
No
No
No
No
No
No
No
No
No
No
No
No
Limitations
Statistical-based
• Most of the approaches do not distinguish
between classes and instances
• Relation semantics is limited to some
types and is not precesily defined
• No domain knowledge
Ontology-based
• All the approaches produce either
enrichments or instances (No Classes)
• Relations are not identified
• No domain knowledge
Hybrid
• Semi-automatic ontology generation
• No domain knowledge
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
5
Proposal
Goal: Generate a domain baseline ontology, containing classes and
relationships, out of folksonomy information.
Folksonomy
Terminology
Extraction
List of domain terms
Domain relevant
resources (URL)
Domain Experts
drive the extraction of domain
classes and relationships from LOD
Semantic
Elicitation
Linked Open Data*
*“Linking Open Data cloud diagram,
by Richard Cyganiak and Anja
Jentzsch. http://lod-cloud.net/”
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
6
Benefits
We propose a process to extract domain knowledge from large and
generic knowledge bases which is driven by the domain
terminology in the folksonomy
• It may save time in the ontology development process
• It allows ontology engineers to understand the domain with a limited
participation of domain experts.
• Smaller and more focused ontologies which are potentially easier to
understand and maintain.
• complex queries and reasoning task may execute faster on smaller
data sets
• In observance of methodological practice, our technique harvests
community knowledge and reuses existing ontologies
• The Ontology has links to external classes and relationships available
in the Linked Open Data cloud.
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
7
Challenges
Problem: Tags lack semantics
Ambiguity
Synonyms
Acronyms
Morphological variations
Plurals
Singulars
Verb Conjugations
Misspellings
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
8
Approach
Terminology Extraction
Goal: To extract domain terminology from the folksonomy
Folksonomy A = U x T x R, G = (V,E) where V = U ∪T ∪ R, and E ={(u, t, r)|(u, t, r) ∈ A}
Resource graph G’ = (V’,E’) where V’ = R, and E’={(ri, rj)|∃((u, tm, ri)∈A ^ (u, tn, rj)∈A ^ tm= tn)}
Spreading Activaction
Seeds: Domain relevant resources from Domain Experts
Nodes weighted with an activation value used to start the search.
Activation value spreads to adjacent nodes by an activation function.
Activation function: ~ Shared tags between the visited node and the source node, and the
source node activation value.
Activation function > threshold: Node marked as activated and the spreading continuous
to adjacent nodes.
Tags of activated nodes are collected as domain terms.
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
9
Approach
Semantic Elicitation
Goal: To relate domain terms (tags) to DBpedia resources
• Normalize the tag to the standard notation of DBpedia resource titles
• Search for a resource with a label equal to the normalized tag using SPARQL
• If not exists: Use an spelling suggestion service and search again
• If exists: Check if it is related to a disambiguation resource
• If true: retrieve disambiguation candidates
Select the most similar candidate to the tag context
• Vector space model
• Candidate Resources represented using their textual descriptions
• Tag represented using its context (i.e, cooccurrent tags)
• Selection of most similar candidate using Cosine
• If false: Select the resource (Default sense in Wikipedia)
Enabling folksonomies for knowledge extraction: A semantic grounding approach (2012)
A García-Silva, I Cantador, Ó Corcho
International Journal on Semantic Web and Information Systems 8 (3), 24-41
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
10
Approach
Semantic Elicitation
Goal: Identify classes from resources
• Use ask constructor to verify if the entity
is a class
• If not:
• Create queries to traverse all the
possible paths of equivalent
relations between the entity and a
class in the RDF graph
RelFinder: Revealing Relationships in RDF Knowledge Bases.
Philipp Heim, Sebastian Hellmann, Jens Lehmann, Steffen Lohmann and
Timo Stegemann In: Proceedings of the 4th International Conference on
Semantic and Digital Media Technologies (SAMT 2009), pages 182-187.
Springer, Berlin/Heidelberg, 2009.
# Query 1.
ASK{<resource> <rdf:type> <rdfs:Class>}
# Query 2
SELECT ?class
WHERE{ <resource> ?rel1 ?class.
?class <rdf:type> <rdfs:Class>
FILTER (?rel1 = <owl:sameAs>) }
# Query 3
SELECT ?class
WHERE{ <resource> ?rel1 ?node.
?node ?rel2 ?class.
?class <rdf:type> <rdfs:Class>
FILTER((?rel1 = <owl:sameAs>) &&
(?rel2 = <owl:sameAs>))}
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
11
Approach
Semantic Elicitation
Goal: To identify relations between
classes
• For each pair of classes
• Create queries to traverse all the
possible paths between two
classes in the RDF graph, and
retrieve the relationships.
Caveats
• May result in adding non relevant domain
information to the ontology
• Large path
• Path passes through abstract
concepts or relationships
• cyc:ObjectType
• umbel:RefConcept
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
12
Approach
Semantic Elicitation
Minimizing the risk to add non relevant information to the ontology
• Keep the path length short
• Our experiments show satisfactory results with short path lengths that allow us to
enrich the initial set of classes while preserving the precision of the ontology
• Avoid high level concepts
• Create lists of high level concepts collected from the knowledge base vocabularies
to filter out the paths containing those concepts
• Knowledge base core vocabularies are usually well documented
• http://umbel.org/specications/vocabulary
• http://mappings.dbpedia.org/server/ontology/classes/
• http://www.cyc.com/kb/thing
• Use semantic similarity distances
• Wu and Palmer, 1994 : Depth of the classes and the common subsumer in the taxonomy
• Jiang and Conrath, 1997: subclasses per class, class depth, information content, etc.
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
13
Evaluation
Experiment in the financial Domain
Finance vocabulary
Input
Evaluation
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
14
Evaluation
Experiment in the financial Domain
Terminology Extraction
Finance Ontology
Finance vocabulary
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
15
Evaluation
Inspecting a financial ontology
• Ran the process with an activation threshold 0.8
• The ontology produced consists of 187 classes, 378 relations of 8 different types,
and 12 modules.
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
16
Evaluation
Inspecting a financial ontology
Ontology Modules
Evaluation
Module
Organization
Company
Person
Union
Banker
Human
A
Precision (Class)
77,80%
88,50%
55,60%
3,74%
100%
100%
Module
Stock Exchange
Money Transactions
Country
Research
Driver
Member
Precision (Class)
84,60%
100%
100%
100%
0%
100%
Class Precision = 80.67%, Relation Precision=96.4%
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
17
Conclusions
• We have generated a method for automatically developing domain
ontologies
• Limited user participation
• We benefit from the aggregation of the individual classifications to
extract an emergent domain vocabulary
• In accordance with methodological guidelines we reuse existing
knowledge (The Web of Data)
• We tap into existing links between data sets to collect related
semantic information
• We avoid, to some extent, semantic mismatches
• We avoid heterogeneous representations
• In practice, we expect the method will be used by ontology engineers to
generate baseline ontologies that can be refined later according to the
ontology requirements.
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
18
Future Work
• Develop a method to assess automatically the validity of the relationships
found in the linked data cloud:
• OpenCyc Stock Exchange is owl:sameAs UMBEL Exchange of User Rights
• However:
• Stock Exchange is an organization
• Exchange of User Rights is an event
•
The use of semantic similarity measures to decide whether to include or
not relationships found setting up a path between two classes.
•
To be able to discover and use datasets in the linked data cloud that cover
the domain of interest.
Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain
19
Social Tags and Linked Data for
Ontology Development:
A Case Study in the Financial
Domain
Andrés García-Silva†, Leyla Jael García-Castro±,
Alexander García*, Oscar Corcho†
†{hgarcia,
ocorcho}@fi.upm.es
Ontology Engineering Group
Universidad Politécnica de Madrid, Spain
± leylajael@gmail.com
Universitat Jaume I, Castellón
de la Plana, Spain
*alexgarciac@gmail.com
State University, Florida, USA
June 2014
Download