Three Theses of Representation in the Semantic Web Ian Horrocks University of Manchester Manchester, UK horrocks@cs.man.ac.uk Peter F. Patel-Schneider Bell Labs Research Murray Hill, NJ, USA pfps@research.bell-labs.com Semantic Web Languages • SemWeb aims to make content accessible to automated processes – Add semantic markup (meta-data) describing content/function of resources • Need a common way of providing meta-data so that: – It can be understood and manipulated by automated processes (“agents”) – Agents can integrate meta-data from different sources • Proposed solution is famous language “layer cake”: Language Architecture • Relationship between adjacent layers not clear – XML $ RDF relationship purely syntactic – RDF $ Ontology layer relationship should be something more? • RDF is proposed as base for SemWeb languages – Used to add metadata annotations to resources – Also used to define syntax and semantics of subsequent layers • Not clear that RDF is appropriate for all these functions – – – – Limited set of syntax constructs (triples) Not possible to extend syntax (as it is, e.g., when using XML) Uniform semantic treatment of triple syntax Non standard KR thesis and model theory • May facilitate development of SemWeb to use more standard KR thesis… Ontology Language Layer • Ontologies set to play key role in SemWeb – source of shared and precisely defined terms for use in meta-data • RDF already extended to RDFS – Hierarchies of classes and properties – Domain and range constraints on properties • More expressive ontology languages clearly required – With logical connectives, quantifiers, transitive properties, etc. – E.g., OIL, DAML+OIL, and now OWL • Possible choices for language layering: – Base ontology language layer(s) on RDF(S) – Base ontology language layer(s) on “classical” FOL – Base ontology language layer(s) on SKIF/Lbase/CL languages Semantics and Model Theories • • • Ontology/KR languages aim to model (part of) world Constructs in language correspond to entities in world Meaning given by mapping to some formal system – E.g., a logic such as FOL with its own well defined semantics – or a data model such as XQuery data model for XML – or (for more expressive languages) a Model Theory (MT) • MT defines relationship between syntax and interpretations – Can be many interpretations (models) of one piece of syntax – Models supposed to be analogue of (part of) world • E.g., elements of model correspond to objects in world – Formal relationship between syntax and models • Structure of models must reflect relationships specified in syntax – Inference (e.g., entailment) defined in terms of MT • E.g., A ² B iff every model of A is also a model of B FOL Thesis • Base SW languages on established FO hierarchy – Propositional logic – Decidable FOL subsets (e.g., DL, Horn) – Undecidable FOL subsets – Full FOL (and even HOL) • Higher layers extend syntax – Upwards compatibility, i.e., syntax retains same meaning in higher layers • Semantics via FOL mapping or standard FO model theory – Individual i ! element of domain (iI 2 D) – Class C ! sets of elements (CI µ D) – Property P ! binary rel on D (PI µ D £ D) (Dis)advantages of FOL Thesis • Pros – Based on well known and extensively studied formalism – Wealth of theoretical knowledge and practical experience – Family of sub-languages with well known formal properties • E.g., decidability, complexity – Highly optimised reasoners for FOL and many sub-languages • E.g., DL reasoners, Horn (rule) reasoners, FOL provers – Mapping to FOL provides easy integration, e.g., of DL and Horn languages – FO subset of RDFS fits well in this framework • Cons – No classes as instances (unless extended to HOL) – Relatively poor fit with full RDFS • Can be axiomatised in FOL, but may damage semantic interoperability and computational properties Axiomatisation • An Axiomatisation can be used to embed RDFS in FOL, e.g.: – Triple x P y translated as holds2(P,x,y) – Axioms capture semantics of language, e.g.: • Problems with axiomatisations include – May require large and complex set of axioms – Difficult to prove semantics have been correctly captured – Axiomatisation may greatly increase computational complexity • RDFS ! undecidable (subset of) FOL – No interoperability unless all languages similarly axiomatised • E.g., in DAML+OIL, C subClassOf D equivalent to 8 x.C(x) ! D(x) • But have to axiomatise as holds2(subClass, C, D) SKIF/Lbase/CL Thesis • Base SW languages on SKIF/Lbase/CL – Similar to FOL thesis, but FOL replaced with CL • Higher layers extend syntax – Upwards compatibility, i.e., syntax retains same meaning in higher layers • Semantics via mapping into CL • CL provides model theory – Individual i ! element of domain (iV 2 D) – Class C ! element of domain (CV 2 D) – Property P ! element of domain (PV 2 D) Second mapping (ext) – Class elt w ! set of elts (ext(w) µ D) – Prop elt k ! binary rel (ext(P) µ D £ D) (Dis)advantages of CL Thesis • Pros – – – – Classes as individuals without HOL extension Can use as a basis for a family of sub-languages Mapping to CL provides easy integration of sub-languages Better fit with RDFS • Cons – – – – Relatively new and untried Little known about CL sub-languages Confusion w.r.t. FOL compatibility RDFS still requires axiomatisation due, e.g., to rdf:type being in domain of discourse • Still no direct semantic interoperability with RDFS – Computational pathway only via (performance-damaging) FOL mapping Confusion w.r.t. FOL Compatibility • SKIF/Lbase/CL use same syntax as FOL – But allow variables to occur in predicate positions • • Originally asserted that SKIF semantics coincide with FOL for well formed FOL sentences Subsequently shown to be wrong for FOL with equality – E.g., • Moral of the story – May confuse users more familiar with classical FOL – Easy to make mistakes with complex new formalisms – Risky to base future of SemWeb on such a new formalism RDF Thesis • All SW languages based on triples – Triple based syntax – Semantics compatible with semantics of triples as defined by RDF MT • Upwards & downwards compatibility – Syntax retains same meaning in higher layers – Higher layer syntax is valid in lower layers • Semantics via RDF model theory – Similar to CL, but only binary predicates – Language syntax also in domain of discourse – Higher layers impose additional constraints on models • Syntax must be encoded as triples – Awkward for complex constructs – Resulting triples also have meaning (Dis)advantages of RDF Thesis • Pros – (Supposed) interoperability between language layers – RDF tools can be used to parse all SW languages into triples – Large ontologies/KBs can be stored in triple DBs • Cons – Achieving real (semantic) interoperability may be difficult or impossible • E.g., efforts to layer OWL on top of RDF(S) – Triple encoding of complex languages such as OWL is very clumsy – Triples introduced by encodings have semantic consequences • E.g., first-rest triples used in list syntax have same consequences as ground facts (even though ordering of list may be arbitrary) – Not clear if technique can be extended to more expressive languages • E.g., full FOL – Computational pathway only via (performance-damaging) FOL mapping Summary • Formal meaning of SW languages crucial to interoperability – Common semantic underpinning facilitates layered architecture • Widely assumed that RDF will provide this underpinning – But layering on top of RDF(S) may be difficult/impossible and does not lead to any direct computational pathway – Moreover, benefits are not clear • Alternative would be to use standard FOL as underpinning – – – – Well established and well understood Established family of languages capturing different trade-offs Direct computational pathway for FOL and many sub-languages FO subset of RDF(S) would fit well in this framework • Third approach is to use CL as underpinning – Relatively new and untested – May not solve problems with RDF(S) Perhaps we should consider recalling the Semantic Web bandwagon in order to carry out a safety modification on the RDF component!