Next Generation Semantic Web Applications Prof. Enrico Motta Director, Knowledge Media Institute The Open University Milton Keynes, UK Structure of the Talk • Quick Recap: What is the Semantic Web? • State of the art: 1st Generation SW Applications – Emphasis on ontology-driven data aggregation – Limited with respect to their ability to exploit large scale, heterogeneous semantic markup • Key research issues – What needs to be done to enable the effective development of the next generation of SW Applications – Need for a different approach to some key res. areas – How the SW itself can be exploited to address such key research issues Quick Recap: What is the Semantic Web? The Semantic Web A large scale, heterogenous collection of formal, machine processable, ontology-based statements (semantic metadata) about web resources and other entities in the world, expressed in a XML-based syntax Ontology Metadata <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> UoD Person hasAffiliation Organization worksInOrgUnit hasJobTitle partOf String Organization-Unit <akt:Person rdf:about="akt:EnricoMotta"> <rdfs:label>Enrico Motta</rdfs:label> <akt:hasAffiliation rdf:resource="akt:TheOpenUniversity"/> <akt:hasJobTitle>kmi director</akt:hasJobTitle> <akt:worksInOrgUnit rdf:resource="akt:KnowledgeMediaInstitute"/> <akt:hasGivenName>enrico</akt:hasGivenName> <akt:hasFamilyName>motta</akt:hasFamilyName> <akt:worksInProject rdf:resource="akt:Neon"/> <akt:worksInProject rdf:resource="akt:X-Media"/> <akt:hasPrettyName>Enrico Motta</akt:hasPrettyName> <akt:hasPostalAddress rdf:resource="akt:KmiPostalAddress"/> <akt:hasEmailAddress>e.motta@open.ac.uk</akt:hasEmailAddress> <akt:hasHomePage rdf:resource="http://kmi.open.ac.uk/people/motta/"/> </akt:Person> SW = A Conceptual Layer over the web SW is Heterogeneous! Generating semantic markup <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> <RDF triple> Key aspects of the SW • Size (= Huge) – Sem. markup (eventually to reach) the same order of magnitude as the web • Conceptual Heterogeneity (= Big) – Sem. markup based on many different ontologies • Rate of change (= Very High) – Data generated all the time from human and artificial agents… • Provenance (= Very Heterogeneous) – ….Hence provenance itself is extremely heterogeneous • Trust (= very variable and subjective) – A side-effect of heterogeneous provenance • Data Quality (= very variable) – No guarantee of correctness • Intelligence (= by-product of size and heterogeneity) – Rather than a by-product of sophisticated problem solving Compare with traditional KBS • Size (= Small or Medium) – KBS normally small to medium size • Conceptual Heterogeneity (= Not an issue) – KBS normally based on a single conceptual model • Rate of change (= Very Low) – Change rate under developers' control (hence, low) • Provenance (= Not an issue) – KBS are normally created ad hoc for an application by a centralised team of developers • Trust (= not a major issue) – Centralisation of devpt. process implies no significant trust issues • Data Quality (= not a major issue) – Again, centralisation guarantees data quality across the board • Intelligence (= by-product of complex, task-centric reasoning) – E.g., sophisticated diagnostic, planning systems… The Semantic Web today 1st Generation SW Applications Bibliographic Data CS Dept Data AKT Reference Ontology <rdf :Description rdf :abo ut=" ht t p:/ /ww w.ecs. sot on.ac.uk/info/#p erso n-01 2 6 9 "> <ns 0 :family -name>Gibbins</ns0:family -name> <ns 0 :full -name>Nicholas Gibbins</ns0:full -name> <ns 0 :given-name>N icholas</ns0:g iven-name> <ns 0 :has-email address> nmg@ecs.sot on.ac.u k</ns0:has -email address> <ns 0 :has-affi liation -to -unit rdf: resour ce=" ht t p:// 1 94 .66 .1 8 3. 2 6 / WEBSITE/G OW /Vie wDepartme nt.aspx?Dep art ment =7 5 0"/> < / rdf :Descriptio n> </ rdf :RDF> RDF Data Features of 1st generation SW Applications • Typically use a single ontology – Usually providing a homogeneous view over heterogeneous data sources Limited use of existing SW data • Closed to semantic resources • Limited interactivity – In contrast with typical web 2.0 applications Hence: current SW applications are far more similar to traditional KBS (closed semantic systems) than to 'real' SW applications (open semantic systems) It is still early days.. 1895 2006 Next Generation SW Applications Next generation SW applications NG SW Application • Able to exploit the SW at large – Hence: Multi-Ontology • Supporting interactivity – E.g., allowing users to add semantic data – Hence, open with respect to SW resources • Ideally also able to exploit non-SW data – E.g., folksonomies – Hence, embedding powerful information extraction engines Two systems we have built Magpie AquaLog Magpie Components Ontology cache (Lexicon) Enriched Web Page Magpie Hub Web Page Problem Domain & Resources Jabber Server (found-item 3275578832 localhost #u"http://localhost/peopl e/motta/" john-domingue john-domingue) (found-item 3275578832 localhost Ontology based Proxy Server Semantic Log AquaLog: Ontology-Driven Question Answering Which is the capital of Spain? NL SENTENCE INPUT Madrid (?, capital, Spain) <Spain, has-capital-city, Madrid> QUERY RESULT TRIPLES TRIPLES Linguistic Analysis Mapping Engine ANSWER NL Generation PowerMagpie: Semantic browsing on the 'open' SW Need for mechanisms for automatically identifying semantic markup relevant to the current page, user, browsing session, etc.. PowerAqua: QA on the 'open' semantic web Need for mechanisms for automatically locating ontologies relevant to the current query, map user terminology to ontologies, integrate info from different ontologies, etc.. What needs to be done to facilitate the development of such 2nd generation SW applications? Dynamic Ontology Selection • First: powerful support for ontology selection • Both PowerAqua and PowerMagpie heavily rely on ontology selection to locate possibly relevant knowledge in response to – User queries (PowerAqua) – Accessing web pages (PowerMagpie) • Hence, ontology selection is a crucial task for both systems Current support for ontology selection Limitations of Swoogle • Query/Search – Only keyword search, we need more powerful query methods (e.g., ability to pose formal queries) • Repository structure – Very weak in Swoogle, not even duplicates are dealt with – Need for automatic derivation of relations between ontologies • E.g., same-ontology-as, ontology-extends, ontology-incompatiblewith, etc….. – We need these relations to structure the repository and to support more powerful ranking methods (see next bp) • Ontology ranking – Swoogle only uses a 'popularity-based' one, we need other methods as well We also need: • Methods for fast extraction of ontology modules – Typically we only want the part of the ontology relevant to our current needs • Methods for the integration of information derived from different ontologies – In the context of QA this problem typically reduces to that of deciding whether two instances denote the same entity Even more importantly.. • Need to look at a number of key research issues in the context provided by NG-SW applications – Example: Ontology Mapping • Current work focuses on design-time mapping of complete ontologies – Example: Ontology Selection • Current work focuses on user-mediated ontology selection – Example: Ontology Modularization • Current work by and large assumes that the user is in the loop A new application scenario • NG-SW applications require algorithms able to perform tasks such as selecting, modularizing, and mapping ontologies at run time • Moreover, in such a context, mapping is concerned with mapping ontology fragments, rather than complete ontologies So What? • Time to go beyond 1st generation applications • 2nd generation SW applications will exploit much more fully the large scale semantic markup provided by the SW • Many issues to be addressed: – Better ontology crawling, indexing, retrieving and ranking support – Mapping, selection, and modularization methods appropriate for NG-SW applications – Further acceleration needed in the generation of semantic markup Exploiting the SW itself to tackle its heterogeneity • Interestingly, a NG-SW-based approach can also be used also to tackle key SW tasks, such as Ontology Mapping – Based on the use of the SW itself as background knowledge Exploiting Large-Scale Semantics Case Study: Using the Semantic Web as background knowledge in Ontology Mapping Ontology Mapping: State of the Art • State-of-the-art methods rely on a combination of: – Label similarity methods • e.g., Full_Professor = FullProfessor – Structure similarity methods • Using taxonomic information or information about domain and range of associated properties • However, as pointed out by Aleksovski et al (EKAW, 2006): – In many cases there is no sufficient lexical overlap – In many cases source and target ontology have not sufficient structure to allow effective structure-based mapping Use of bkg. knowledge for ontology mapping Background Knowledge A ? B External Source = One Ontology Alekszovski et al. EKAW’06 • Map candidate terms into concepts from a richly axiomatized domain ontology (anchors) • Derive a mapping based on the relation of the anchor terms Advantages: • Handles dissimilar ontologies • Returns semantic mappings rel B’ A’ = A = rel B Disadvantages: • Assumes that a suitable domain ontology is available. • Approach only suitable for closed domains External Source = Web van Hage et al. ISWC’05 • rely on Google and an online dictionary in the food domain to extract semantic relations between candidate mappings using IR techniques + OnlineDictionary Advantages: • General purpose IR Methods A rel B Disadvantages: • IR Methods introduce noise External Source = WordNet Lopez et al. ESWC ’05 • use wordnet to map queries expressed in the user's terminology to a domain ontology to support question answering Advantages: WordNet • General purpose A rel B Disadvantages: • Knowledge sparseness • Works best with concepts, not so useful with relations • WordNet is not an ontology!!! Knowledge-poor ontology mapping • Actually isn’t a bit strange that such complex and knowledge-poor methods are devised, when the SW already provides so much background knowledge?…. External Source = SW Proposal: • rely on online ontologies (Semantic Web) to derive mappings • ontologies are dynamically discovered and combined Semantic Web A rel B Advantages: • General purpose • Does not introduce noise • Works with any kind of domain entities (concepts, relations, instances) Strategy 1 - Definition Semantic Web Find ontologies that contain equivalent classes to A and B and use their relationship in the ontologies to derive the mapping. For each ontology use these rules: B1’ A1 ’ A2’ B2’ … O2 O1 An’ On Bn’ A' B' A B A' B' A B A' B' A B A' B' A B A rel B These rules can be extended to take into account indirect relations between A’ and B’, e.g., between parents of A’ and B’: A' C C B' A' B' Strategy 1- Variants Semantic Web Quick variant: Stop as soon as a relation is found A B1’ O1 A1’ B Strategy 1- Variants Semantic Web Precise variant: Derive all possible mappings from all ontologies and combine them into a final mapping. B1’ O1 A B2’ A1’ O2 A2’ B Dealing with Contradictions: •Return all mappings even if contradictory •Return a mapping only when there is no contradiction •Return the most frequent mapping (i.e., the mapping derived from most ontologies) •Return the mappings with 'higher authority' (based on metrics of ontology evaluation or trust) •Try to combine mappings A BB A A B Food Semantic Web Semantic Web Strategy 1- Examples MeatOrPoultry RedMeat Beef AcademicStaff Researcher ka2.rdf Tap Beef SR-16 Food FAO_Agrovoc Researcher ISWC AcademicStaff SWRC Strategy 2 - Definition Principle: If no ontologies are found that contain the two terms then combine information from multiple ontologies to find a mapping. Details: (1) Select all ontologies containing A’ equiv. with A (2) For each ontology containing A’: Semantic Web rel B’ C’ C rel (a) if A' C find relation between C and B. (b) if A' C find relation between C and B. (r1)A' C C B A B B A’ A (r 2) A' C C B A B (r 3) A' C C B A B rel B (r 4) A' C C B A B (r 5) A' C C B A B Strategy 2 - Examples Ex1: Chicken Vs. Food Chicken Poultry Poultry Food (midlevel-onto) (r1) Chicken Food (Tap) (Same results for Duck, Goose, Turkey) Ex2: Ex3: Ham Vs. Food Ham Meat Meat Food (pizza-to-go) (SUMO) Ham Vs. Seafood Ham Meat Meat Seafood (pizza-to-go) (wine.owl) (r1) (r3) Ham Food Ham Seafood Conclusions • Using the SW as background knowledge for ontology mapping has several benefits – Suitable for our NG-SW scenario as there is no need for design-time selection of a background knowledge – Even when design-time selection is feasible, it is suitable for those cases where a suitable domain ontology cannot be found – Reduces noise by exploiting only ontologies – Can be tailored to handle multiple solutions – Can be integrated with other approaches, based on lexical and structural analysis If you would like to find out more.. • 'Vision' papers – Motta, E., Sabou, M. (2006). "Next Generation Semantic Web Applications". 1st Asian Semantic Web Conference, Beijing. – Motta, E., Sabou, M. (2006). "Language Technologies and the Evolution of the Semantic Web". LREC 2006, Genoa, Italy. – Motta, E. (2006). "Knowledge Publishing and Access on the Semantic Web: A Socio-Technological Analysis". IEEE Intelligent Systems, Vol.21, 3, (88-90). • Ontology Modularization – D' Aquin, M., Sabou, M., Motta, E. (2006). "Modularization: A key for the dynamic selection of relevant knowledge components". ISWC 2006 Workshop on Ontology Modularization If you would like to find out more.. • Ontology Mapping – Lopez, V., Sabou, M., Motta, E. (2006). "Mapping the real semantic web on the fly". International Semantic Web Conference, Georgia, Atlanta. – Sabou, M., D'Aquin, M., Motta, E. (2006). "Using the semantic web as background knowledge for ontology mapping". ISWC 2006 Workshop on Ontology Mapping. • Ontology Selection – Sabou, M., Lopez, V., Motta, E. (2006). "Ontology Selection for the Real Semantic Web: How to Cover the Queen’s Birthday Dinner?". Proceedings of EKAW 2006, Podebrady, Czech Republic.