Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva†, Leyla Jael García-Castro±, Alexander García*, Oscar Corcho† †{hgarcia, ocorcho}@fi.upm.es Ontology Engineering Group Universidad Politécnica de Madrid, Spain ± leylajael@gmail.com Universitat Jaume I, Castellón de la Plana, Spain *alexgarciac@gmail.com State University, Florida, USA June 2014 Introduction Folksonomies Web 2.0 Usergenerated Content Social Networks Tools for organizing, sharing & discovering Information Tagging Systems Tutorial Java Programming language Java Persistent Access Database Knowledge Base Folksonomy Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 2 Introduction Folksonomies Folksonomies as a source of knowledge • Vocabulary emerges around resources and users Golder and Huberman (2006), Marlow et al. (2006) • Maintained by a large user community • Flexible (No restricted) • Up-to-date • Emergent semantics from the aggregation of individual classifications Gruber (2007), Mika (2007), Specia and Motta (2007) Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 3 State of the art Folksonomies Statistical-based Ontology-based Ontology Generation Tag Similarity Measures Two tags are related if.. relation? Cattuto et al. (2008) Markines et al. (2009) Körner et al. (2010) Benz et al. (2011) Heymann and Garcia-Molina. (2006) Begelman et al. (2006) Hamasaki et al. (2007) Jäschke et al. (2008) Kennedy et al. (2007) Mika (2007) Benz et al. (2010) Limpens et al. (2010) Ontology Folksonomy Angeletou et al. (2008) Cantador et al. (2008) García-Silva et al. (2009) Maala et al. (2008) Passant (2007) Tesconi et al. (2008)) Ontology Hybrid approaches Ontology Giannakidou et al. (2008) Specia and Motta (2007). Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 4 State of the art Folksonomies Approach Type Auto Dat Src. Mika, 2007 Hamasaki et al., 2007 Jaschke et al., 2008 Limpens et al., 2010 Begelman et al., 2006 Kennedy et al., 2007 Heyman & Garcia Molina, 2006 Benz et al., 2010 Giannakidou et al., 2008 Specia & Motta, 2007 Angeletou et al., 2008 Cantador et al., 2008 Tesconi et al., 2008 Passant, 2007 Maala et al., 2008 Stat Stat Stat Stat Stat Stat Stat Stat Hyb Hyb Ont Ont Ont Ont Ont Yes Yes Yes Semi Yes Yes Yes Yes Yes Semi Yes Yes Yes No Yes Del,Oth Pol Del,Bib Oth Del,Raw Fli Del,Cit Del Fli Del,Fli Fli Fli,Del Del Oth Fli Select. & Cleaning Yes No Yes No Yes Yes No No Yes Yes Yes Yes Yes Yes Yes Context Ident. Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Disambiguation No Yes No Yes No Yes No Yes Yes Yes Yes No Yes Yes No Sem. Ident Yes No No Yes No Yes No Yes No Yes Yes Yes Yes Yes Yes Output Evaluation Onto Onto Hier Enri Clus Inst Hier Hier Clus Onto Enri Inst Enri Enri Enri Desc Study Task-based Desc Study Pres/Rec Desc Study Pres/Rec Task-based Pres/Rec No Desc Study Pres/Rec Pres/Rec Pres/Rec Desc Study Desc Study Domain Knowledge No No No No No No No No No No No No No No No Limitations Statistical-based • Most of the approaches do not distinguish between classes and instances • Relation semantics is limited to some types and is not precesily defined • No domain knowledge Ontology-based • All the approaches produce either enrichments or instances (No Classes) • Relations are not identified • No domain knowledge Hybrid • Semi-automatic ontology generation • No domain knowledge Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 5 Proposal Goal: Generate a domain baseline ontology, containing classes and relationships, out of folksonomy information. Folksonomy Terminology Extraction List of domain terms Domain relevant resources (URL) Domain Experts drive the extraction of domain classes and relationships from LOD Semantic Elicitation Linked Open Data* *“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 6 Benefits We propose a process to extract domain knowledge from large and generic knowledge bases which is driven by the domain terminology in the folksonomy • It may save time in the ontology development process • It allows ontology engineers to understand the domain with a limited participation of domain experts. • Smaller and more focused ontologies which are potentially easier to understand and maintain. • complex queries and reasoning task may execute faster on smaller data sets • In observance of methodological practice, our technique harvests community knowledge and reuses existing ontologies • The Ontology has links to external classes and relationships available in the Linked Open Data cloud. Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 7 Challenges Problem: Tags lack semantics Ambiguity Synonyms Acronyms Morphological variations Plurals Singulars Verb Conjugations Misspellings Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 8 Approach Terminology Extraction Goal: To extract domain terminology from the folksonomy Folksonomy A = U x T x R, G = (V,E) where V = U ∪T ∪ R, and E ={(u, t, r)|(u, t, r) ∈ A} Resource graph G’ = (V’,E’) where V’ = R, and E’={(ri, rj)|∃((u, tm, ri)∈A ^ (u, tn, rj)∈A ^ tm= tn)} Spreading Activaction Seeds: Domain relevant resources from Domain Experts Nodes weighted with an activation value used to start the search. Activation value spreads to adjacent nodes by an activation function. Activation function: ~ Shared tags between the visited node and the source node, and the source node activation value. Activation function > threshold: Node marked as activated and the spreading continuous to adjacent nodes. Tags of activated nodes are collected as domain terms. Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 9 Approach Semantic Elicitation Goal: To relate domain terms (tags) to DBpedia resources • Normalize the tag to the standard notation of DBpedia resource titles • Search for a resource with a label equal to the normalized tag using SPARQL • If not exists: Use an spelling suggestion service and search again • If exists: Check if it is related to a disambiguation resource • If true: retrieve disambiguation candidates Select the most similar candidate to the tag context • Vector space model • Candidate Resources represented using their textual descriptions • Tag represented using its context (i.e, cooccurrent tags) • Selection of most similar candidate using Cosine • If false: Select the resource (Default sense in Wikipedia) Enabling folksonomies for knowledge extraction: A semantic grounding approach (2012) A García-Silva, I Cantador, Ó Corcho International Journal on Semantic Web and Information Systems 8 (3), 24-41 Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 10 Approach Semantic Elicitation Goal: Identify classes from resources • Use ask constructor to verify if the entity is a class • If not: • Create queries to traverse all the possible paths of equivalent relations between the entity and a class in the RDF graph RelFinder: Revealing Relationships in RDF Knowledge Bases. Philipp Heim, Sebastian Hellmann, Jens Lehmann, Steffen Lohmann and Timo Stegemann In: Proceedings of the 4th International Conference on Semantic and Digital Media Technologies (SAMT 2009), pages 182-187. Springer, Berlin/Heidelberg, 2009. # Query 1. ASK{<resource> <rdf:type> <rdfs:Class>} # Query 2 SELECT ?class WHERE{ <resource> ?rel1 ?class. ?class <rdf:type> <rdfs:Class> FILTER (?rel1 = <owl:sameAs>) } # Query 3 SELECT ?class WHERE{ <resource> ?rel1 ?node. ?node ?rel2 ?class. ?class <rdf:type> <rdfs:Class> FILTER((?rel1 = <owl:sameAs>) && (?rel2 = <owl:sameAs>))} Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 11 Approach Semantic Elicitation Goal: To identify relations between classes • For each pair of classes • Create queries to traverse all the possible paths between two classes in the RDF graph, and retrieve the relationships. Caveats • May result in adding non relevant domain information to the ontology • Large path • Path passes through abstract concepts or relationships • cyc:ObjectType • umbel:RefConcept Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 12 Approach Semantic Elicitation Minimizing the risk to add non relevant information to the ontology • Keep the path length short • Our experiments show satisfactory results with short path lengths that allow us to enrich the initial set of classes while preserving the precision of the ontology • Avoid high level concepts • Create lists of high level concepts collected from the knowledge base vocabularies to filter out the paths containing those concepts • Knowledge base core vocabularies are usually well documented • http://umbel.org/specications/vocabulary • http://mappings.dbpedia.org/server/ontology/classes/ • http://www.cyc.com/kb/thing • Use semantic similarity distances • Wu and Palmer, 1994 : Depth of the classes and the common subsumer in the taxonomy • Jiang and Conrath, 1997: subclasses per class, class depth, information content, etc. Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 13 Evaluation Experiment in the financial Domain Finance vocabulary Input Evaluation Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 14 Evaluation Experiment in the financial Domain Terminology Extraction Finance Ontology Finance vocabulary Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 15 Evaluation Inspecting a financial ontology • Ran the process with an activation threshold 0.8 • The ontology produced consists of 187 classes, 378 relations of 8 different types, and 12 modules. Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 16 Evaluation Inspecting a financial ontology Ontology Modules Evaluation Module Organization Company Person Union Banker Human A Precision (Class) 77,80% 88,50% 55,60% 3,74% 100% 100% Module Stock Exchange Money Transactions Country Research Driver Member Precision (Class) 84,60% 100% 100% 100% 0% 100% Class Precision = 80.67%, Relation Precision=96.4% Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 17 Conclusions • We have generated a method for automatically developing domain ontologies • Limited user participation • We benefit from the aggregation of the individual classifications to extract an emergent domain vocabulary • In accordance with methodological guidelines we reuse existing knowledge (The Web of Data) • We tap into existing links between data sets to collect related semantic information • We avoid, to some extent, semantic mismatches • We avoid heterogeneous representations • In practice, we expect the method will be used by ontology engineers to generate baseline ontologies that can be refined later according to the ontology requirements. Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 18 Future Work • Develop a method to assess automatically the validity of the relationships found in the linked data cloud: • OpenCyc Stock Exchange is owl:sameAs UMBEL Exchange of User Rights • However: • Stock Exchange is an organization • Exchange of User Rights is an event • The use of semantic similarity measures to decide whether to include or not relationships found setting up a path between two classes. • To be able to discover and use datasets in the linked data cloud that cover the domain of interest. Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain 19 Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain Andrés García-Silva†, Leyla Jael García-Castro±, Alexander García*, Oscar Corcho† †{hgarcia, ocorcho}@fi.upm.es Ontology Engineering Group Universidad Politécnica de Madrid, Spain ± leylajael@gmail.com Universitat Jaume I, Castellón de la Plana, Spain *alexgarciac@gmail.com State University, Florida, USA June 2014