Semantic Web Application Dr Abdulrahman Altahhan Course Director for MSc in Data Science Coventry University, ab8556@coventry.ac.uk Computers • Originally: Computers used for numerical calculation • Currently: Computer used for Information Processing – Database – Text Processing – Games The Web • Seeking, using or logging information • Sociably • Business/ buying and selling • The main Entry point for the Web world is? • Search Engines Search Engine • Search Engine Play a game called prediction • User provides some clues of what s/he wants • Search Engine finds what it believes the user wants Search Engines • SE are really doing predictions(classification ) • relevant/non relevant documents • Have high recall , low precision? Predicted Relevance Actual Relevance Total True Positive False Negative False Positive True Negative TP/(TP+FP) Precision Total TP/(TP+FN) Recall Search Engines • Despite improvements of SE – – – – High recall low precision Sensitive to vocabulary(keywords) Semantically similar queries do not return same results? Single Webpage… Solution…? • • • • Search Engines • Despite improvements of SE – – – – High recall low precision Sensitive to vocabulary(keywords) Semantically similar queries do not return same results? Single Webpage… Solution…? • Either come up with more sophisticated text processing techniques • Or cheat …! • Search Engines • Despite improvements of SE – – – – High recall low precision Sensitive to vocabulary(keywords) Semantically similar queries do not return same results? Single Webpage… Solution…? • Either come up with more sophisticated retrieval techniques • Or cheat …! – Then we need to change the way we store things! – We need to reengineer the Web To be more suitable for machines • Or both: Semantic Web Web story • HTML: concerns the look only – <H1> Title: Professor </H1> • XML: concern structured own localized tags – <Title> Professor </Title> • RDF: concerns relationships, user defined schema – Makes no assumption about domain – All have URI – <“Mozart”, composed, “The Magic Flute” > • RDFS: extends RDF with a standard ontology vocabulary – Class, Property - Type, subClassOf - domain, range • Ontology: extends RDFS with capability of constructing classes as well as agreeing on a domain terms Semantic Web Vision Machine-processable, global Web standards: Assigning unambiguous names (URI) Expressing data, including metadata (RDF) Capturing ontologies (OWL) Query, rules, transformations, deployment, application spaces, logic, proofs, trust (in progress) [Source: Emerging Web Technologies to Watch, Steve Bratt, W3C] 10 XML User definable and domain specific markup HTML: <H1>Internet and World Wide Web</H1> <UL> <LI>Code: G52IWW <LI>Students: Undergraduate </UL> XML:? 11 XML User definable and domain specific markup HTML: <H1>Internet and World Wide Web</H1> <UL> <LI>Code: G52IWW <LI>Students: Undergraduate </UL> XML: 12 <module> <title>Internet and World Wide Web</title> <code>G52IWW</code> <students>Undergraduate</students> </module> RDF • Resources: – Books, Person, etc. • Property – “written by”, “title”, “married to”, etc. • Statement – object-attribute-value triple RDF for semantic annotation • • • • RDF provides metadata about Web resources Object -> Attribute-> Value triples It has an XML syntax Chained triples form a graph http://sepang.nottingham.edu.my/~bpayam/images/payam-barnaghi.png has_image http://sepang.nottingham.edu.my/~bpayam/#Payam UNiM has_owner #Payam payam@nottingh am has_teaching http://www.nottingham.edu.my/CSIT/G53ELC 14 has_email <rdf:Description rdf:about=“#Payam”> <has_email>payam@nottingham</has_email> </rdf:Description> RDF Example 15 Source: http://www.w3.org/TR/swbp-skos-core-guide/ What does RDF Schema add? • Defines vocabulary for RDF • Organizes this vocabulary in a typed hierarchy • Class, subClassOf, type • Property, subPropertyOf • domain, range Staff subClassOf Lecturer domain supervisedBy type Tom 16 [adapted from: Studer et al, 04] supervisedBy subClassOf range Schema(RDFS) Research Assistant type Alan Data(RDF) Basic Queries: Example select X,Y From {X} writtenBy {Y} X, Y are variables, {X} writtenBy {Y} represents a resource-property-value triple 17 Conclusions about RDF(S) • Next step up from plain XML: – (small) ontological commitment to modeling primitives – possible to define vocabulary • However: – no precisely described meaning – no inference model [Davies, 03] 18 Ontologies • Term is originated from philosophy • For the Semantic Web purpose: – “An ontology is an explicit and formal specification of a conceptualisation”. (R. Studer) 19 Ontologies and Semantic Web • Provide a shared understanding of a domain • Consists of a finite list of – Terms (properties or classes) • Ex.: staff members, students, courses, modules, lecture theatres, and schools are important concepts (terms) – relationships • Ontologies are useful for improving accuracy of Web searches. • Web searches can exploit generalization/specialization information 20 Ontologies and Semantic Web (cont’d) • In the context of the Web, ontologies provide a shared understanding of a domain. • Such a shared understanding is necessary to overcome the difference in terminology. • Ontologies are useful for improving accuracy of Web searches. • Web searches can exploit generalization/specialization information. 21 A Sample Ontology Object is_a knows Person described_in Topic writes is_a Student Researcher Semantics is_a Affiliation Siggi +49 721 608 6554 Ontology T Affiliation P writes Ontology similar F-Logic instance_of Tel F-Logic subTopicOf PhD Doktoral Student PhDStudent Student PhD Student similar described_in D D is_about T Rules T is_about P knows D T AIFB • Major Paradigms: Logic Programming, Description Logic • Standards: RDF(S); OWL 22 Document [Studer et al, 04] Ontologies (OWL) • RDFS : do not provide – similarity and/or differences of terms – construct classes, not just name them – can a program reason about some terms? E.g.: • “if «Person» resources «A» and «B» have the same «foaf:email» property, then «A» and «B» are identical” – etc. • This lead to the development of OWL (Web Ontology Language) • Basically we would like to engineer the web in a form similar to a set of domain databases source: Introduction to the Semantic Web, Ivan Herman, W3C 23 Classes in OWL • In RDFS, you can subclass existing classes… that’s all. • In OWL, you can construct classes from existing ones: – enumerate its content – through intersection, union, complement – through property restrictions source: Introduction to the Semantic Web, Ivan Herman, W3C 24 OWL classes can be “enumerated” The OWL solution, where possible content is explicitly listed: source: Introduction to the Semantic Web, Ivan Herman, W3C 25 Why develop an ontology? • To make define web resources more precisely and make them more amenable to machine processing • To make domain assumptions explicit – Easier to change domain assumptions – Easier to understand and update legacy data • To separate domain knowledge from operational knowledge – Re-use domain and operational knowledge separately • A community reference for applications • To share a consistent understanding of what information means [Davies, 03] 26 Inference Example prof (X) → faculty(X) faculty(X) → staff (X) prof(X) staff(X) prof (michael) faculty(michael) staff (michael) source: A Semantic Web Primer, Grigoris Antoniou and Frank van Harmelen, MIT Press 27 Semantic Web and AI? • No human-level intelligence claims • As with today’s WWW – large, inconsistent, distributed • Requirements – scalable, robust, decentralised – tolerant, mediated • Semantic Web will make extensive use of current AI, – any advancement in AI will lead to a better Semantic Web – Current AI is already sufficient to go towards realizing the semantic web vision • As with WWW, Semantic Web will (need to) adapt fast [Davies, 03] 28 Semantic Web & Knowledge Management • Organising knowledge in conceptual spaces according to its meaning. • Enabling automated tools to check for inconsistencies and extracting new knowledge. • Replacing query-based search with query answering. • Defining who may view certain parts of information 29 Elsevier: Horizontal Information Products • Elsevier is a leading scientific publisher • Its papers are organized according to journals (vertical) • Different types of journals Elsevier: Horizontal Information Products • Elsevier is a leading scientific publisher • Its papers are organized according to journals (vertical) • Different types of journals Problem • Sometime the subscribers are interested in getting everything related to a certain topic that is spread across traditional disciplines • Example: Alzheimer disease (biology, medicine, chemistry etc.) Elsevier: Horizontal Information Products Problem • Sometime the subscribers are interested in getting everything related to a certain topic that is spread across traditional disciplines • Example: Alzheimer disease (biology, medicine, chemistry etc.) • Solution: Elsevier: Horizontal Information Products Problem • Topic across disciplines – Alzheimer disease – biology, medicine, chemistry etc. – Same topic have different names Solution – A Thesaurus (lightweight ontology ) – Each domain has its own (ex. MeSH for medical) – Used to access information sources such as MBASE and Science Direct – Still not the best full ontological approach – But it is a start – Why? Elsevier: Horizontal Information Products Problem • Topic across disciplines – Alzheimer disease – biology, medicine, chemistry etc. – Same topic have different names Solution – – – – A Thesaurus (lightweight ontology ) Each domain has its own (ex. MeSH for medical) Used to access information sources such as MBASE and Science Direct Elsevier uses EMTREE as a single underlying ontology against which all vertical information sources are indexed • Semantic Web plays roles of: – – – – RDF is used as an interoperability format between heterogynous data EMTREE ontology is represented in RDF (not the best thing to do!) Each separate data source is mapped onto EMTREE The ontology is used as the entry for all data scources Audi: Data Integration Problem • Similar problem of Elsevier but internally • Data Integration : highest cost factor (IT-wise) for large companies • Audi: 51,000 Employee 22 billion rev 700,000 cars annually 1000 databases: caused missing out opportunities as data sources are not interconnected – The databases cannot be queried against one simple query that returns reliable timely information that could be used for decision making – Audi relies on costly manual code generation and point-to-point translation scripts for data integration – – – – Audi: Data Integration Solution • Could be to create a gigantic data warehouse or big data analysis which will entails a lot of changes migration issues Or Ontologies: • Rationalizing disparate data sources into one body of information • Create ontology for data and content sources • Add generic domain information Audi: Data Integration Solution • Could be to create a gigantic data warehouse or big data analysis which will entails a lot of changes migration issues Or Ontologies: • Rationalizing disparate data sources into one body of information • Create ontology for data and content sources • Add generic domain information • Integration can be done without disturbing existing application • The ontology is mapped to the data sources (fields, record, files, documents) • Which gives applications access to the data thorough the ontology Problem Audi: Data Integration Application A • A is using the encoding Application B • B is using the encoding However there is no way that a computer can know that both talks about the same thing and that Olympus-OM-10 is a type of SLR Solution Audi: Data Integration We can provide ad hoc translation for these data sources however it is not portable Instead we might write a simple camera ontology in OWL Solution Audi: Data Integration Application A • A is using the first encoding • • • • Application B B is using second encoding It receives data form B B parses the XML doc form A B encounters SLR (A ?) Solution Audi: Data Integration Application A • A is using the first encoding • • • • • Ontology • I now the solution! • • Application B B is using second encoding It receives data form B B parses the XML doc form A B encounters SLR (A ?) B consults camera ontology What do you know about SLR Ontology returns “SLR is a type of camera” • The point her is that syntactic divergence is no longer a hindrance. • In fact it is desirable because each app can use its own that suits it locally • The ontology provides a single integration rather than n2 individual mappings Skill Finding at Swiss Life • The Swiss Life Group is the largest life insurance company of Switzerland – 11000 employees – 14$ billion of written premiums – Has subsidiaries branches offices in 50 countries Problem – Its distribution over wide geographical and culturally divers areas – The construction of a company wide skills repository is difficult Skill Finding at Swiss Life Problem – The construction of a company wide skills repository is difficult • Solution – Swiss Life used a hand-built ontology to cover skills in three organizational units: • IT • Private Insurance • HR – Consisted of 700 concepts + 180 educational concepts – 130 job function concepts divided across the 3 units Skill Finding at Swiss Life • Solution – Swiss Life used a hand-built ontology to cover skills in three organizational units: • IT • Private Insurance • HR – Consisted of 700 concepts + 180 educational concepts – 130 job function concepts divided across the 3 units Skill Finding at Swiss Life • Solution Skill Finding at Swiss Life • Solution Conclusion • Interoperation and mapping are main obstacles to realizing semantic web • Understanding what is available is a necessary prerequisite to the specific application • The solutions are not always complete or optimal • We have described a number of case studies of applying semantic web technologies to support supporting interoperation Questions ? • Discussions Sources • A Semantic Web Primer, Grigoris Antoniou and Frank van Harmelen, ISBN 0-262-01210-3, 2004, the MIT press. • W3C Semantic Web http://www.w3.org/2001/sw/ • The Semantic Web Community Portal, http://www.semanticweb.org • The European Semantic Web Conference