What is the Semantic Web? “The Semantic Web is an extension of the current web in which information is given a well-defined meaning, better enabling computers and people to work in cooperation.” – Tim Berners-Lee, et al [The Semantic Web, Scientific American, 2001.] “A set of standards and best practices for sharing data and the semantics of that data over the Web for use by applications” -- Bob DuCharme [Learning SRARQL, 2013.] Standards: 1. RDF data model 2. SPARQL query language 3. RDFS and OWL standards for storing vocabularies and ontologies. Best practices include the use of URIs (IRIs) to refer to entities on the web and use of standards. 1 Semantic Web Layer Cake Source: http://www.semanticfocus.com/blog/entry/title/introduction-to-the-semantic-web-vision-and-technologies-part-2foundations/ The Semantic Web is a REALITY Currently, the Semantic Web encompasses almost 10000 databases, >85 billion facts, > 800 million links. These are publicly available data, identifiable via URI and accessible via HTTP. Example: DBPedia -- Wikipedia for the Semantic Web, which can be used by both, humans and computers. For humans, information is returned as an HTML document, for computers – information is returned in machine understandable RDF format. The link http://dbpedia.org/resource/Central_Connecticut_State_University http://dbpedia.org/page/Central_Connecticut_State_University (returns the web page) http://dbpedia.org/data/Central_Connecticut_State_University (returns machine-understandable representation) 3 About: Central Connecticut State University An Entity of Type : Public university, from Named Graph : http://dbpedia.org, within Data Space : dbpedia.org Property Value •Central Connecticut State University is a regional, comprehensive public university in New Britain, Connecticut. Founded in 1849 as Connecticut Normal School, CCSU is Connecticut's oldest publicly funded Central Connecticut State University is a regional, comprehensive public university in New Britain, Connecticut. Founded in 1849 as Connectic university. CCSU is made up of four schools: the Ammon School of Arts & Science, the School of Business, the School of Education & Professional Studies, and the School of Engineering & Technology. Attended by over 11,000 students, 9,200 are undergraduates, and 2,000 are graduate students. It is part of the Connecticut State University System , which also oversees Eastern, Western, and Southern Connecticut State Universities. Together they have a student body of over 34,000. As a commuter school, more than half of students live off campus and ninety percent are in-state students. dbo:abstract dbo:affiliation •dbr:Connecticut_State_University_System dbo:athletics •dbr:National_Collegiate_Athletic_Association dbo:campus •dbr:Suburb dbo:city •dbr:New_Britain,_Connecticut dbo:country •dbr:United_States dbo:endowment •4.7E7 dbo:formerName •Central Connecticut State College •Connecticut Normal School •New Britain Normal School •Teachers College of Connecticut dbo:mascot •Blue Devil dbo:numberOfPostgraduateStudents •2094 (xsd:integer) dbo:numberOfUndergraduateStudents •9771 (xsd:integer) dbo:officialSchoolColour •BlueandWhite 4 CCSU info as a machine-understandable document <?xml version="1.0" encoding="utf-8" ?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:georss="http://www.georss.org/georss/" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dbp="http://dbpedia.org/property/" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:ns7="http://www.w3.org/ns/prov#" xmlns:dbo="http://dbpedia.org/ontology/" xmlns:dct="http://purl.org/dc/terms/" > <rdf:Description rdf:about="http://dbpedia.org/resource/Yan_Klukowski__3"> <dbo:team rdf:resource="http://dbpedia.org/resource/Central_Connecticut_State_University" /> </rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/resource/Dan_Gaspar__8"> <dbo:team rdf:resource="http://dbpedia.org/resource/Central_Connecticut_State_University" /> </rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/resource/1999%E2%80%932000_Los_Angeles_Clippers_season"> <dbp:college rdf:resource="http://dbpedia.org/resource/Central_Connecticut_State_University" /> </rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/resource/List_of_Phi_Beta_Sigma_chapters"> <dbp:school rdf:resource="http://dbpedia.org/resource/Central_Connecticut_State_University" /> </rdf:Description> …….. 5 XML (eXtended Markup Language) XML is a flexible text format that is used to structure, store, and transport data over the Web. Contrary to HTML, which is about displaying data, XML is about describing data, BUT there is no one standard way to describe the same data. Example: Consider the concept COURSE, and its instance CS462. HTML description XML description <H1> CS462: AI</H1> <course> <UL> <title> CS462: AI </title> <LI> CRV: 4185 <CRV> 4185 </CRV> <LI> Level: undergrad/grad <level> undergrad/grad </level> <LI> Professor: NZ, office hours … <Professor> <LI> Website: www.cs.ccsu.edu/~neli <name> NZ </name> </UL> <office hours> …</office hours> <Website> … </Website> </Professor> </course> 6 XML documents are labeled trees Course Professor CRN Name Title Level Web site 7 XML (contd.) XML documents are easily readable and understandable by humans, because their tags are familiar terms, but • XML lacks semantics, and • XML makes no commitment to ontological vocabulary, nor to ontological modelling , i.e. can not serve as knowledge representation language. Because XML is a universal meta markup language, the same term can be given different meanings by different sources (for example title can mean “book title” or “person title”) . To resolve such inconsistencies, the so-called namespaces are used. For example: xmlns:dc=“http://purl.org/dc/elements/1.1/” defines namespace dc (Dublin Core) and <dc:title>Artificial Intelligence</dc:title> suggests that term title refers to a book. xmlns:v=“http://www.w3.org/2006/vcard/” describes people, and <v:title>Doctor</v:title> suggests that term title refers to a person. 8 RDF (Resource Description Framework) • RDF is the foundation for representing and processing knowledge on the web. It is a graph-based data model, where knowledge is represented as a list of statements called triples. • Each triple has the form “subject, predicate, object”. Example: “Jones TEACHES Math101” • Each element of a triple (the resource) is identified by a URI. Example: <http://myUniv.edu/people/Jones> <http://myUniv.edu/terms/teaches> <http://myUniv.edu/courses/Math101> --- in N-triples format. RDF can be implemented in various ways (called serializations), one of which has XMLbased syntax to support syntactic interoperability. Example: <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:myUniv=“http:/myUniv.edu/terms/”> <rdf:Description rdf:about=“http://myUniv.edu/jones”> <myUniv:teaches> <rdf:Description rdf:about=“http://myUniv.edu/courses/Math101”> </rdf:Description> </rdf:Description> </rdf:RDF> 9 RDF domain example -- Friend of a Friend (FOAF) domain RDF class Person <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:foaf="http://xmlns.com/foaf/0.1/"> <foaf:Person> <foaf:name>Neli Zlatareva</foaf:name> <foaf:title>Dr</foaf:title> <foaf:givenname>Neli</foaf:givenname> <foaf:family_name>Zlatareva</foaf:family_name> <foaf:homepage rdf:resource="http://www.cs.ccsu.edu/~neli"/> </foaf:Person> </rdf:RDF> 10 RDF statements are directed labeled graphs <http://www.math.ccsu/jones> <http://www.cs.ccsu.edu/~neli/univ.owl#teaches> <http://www.ccsu.edu/catalog/Math101> • RDF is provided with a model-theoretic semantics that defines the notion of entailment between two RDF statements. • RDF graphs are finite sets of RDF triples. • This types of graphs are very similar to semantic nets. 11 Another example Consider the following set of triples: { <?p1 foaf:name “Jones”>, <?p1 foaf:knows ?p2>, <?p1 myUniv:teaches ?c1>, <?p2 myUniv:studies ?c1>, <?p2 foaf:name “Bob”>, <?p2 foaf:mbox “Bob@mygmail.com”>, <?c1 rdf:type myUniv:course>, <?c1 foaf:name “Math101”>} where foaf : <http://xmlns.com/foaf/0.1/> rdf: <http://www.w3.org/1999/02/22-rdf-syntav-ns#type> “Jones” “Bob” foaf:name foaf:name foaf: knows foaf:mbox _:p1 _:p2 myUniv:teaches foaf:name myUniv:studies “Bob@mgmail.com” _:c1 “Math101” rdf:type “myUniv:course” 12 RDF Schema (RDF Vocabulary Description Language) • RDF is a universal language that allow users to describe their own domains, but it does not make assumptions about any particular domain. • RDF Schema defines the vocabulary, specifies object properties and their values, and describes the relations between objects. • RDF Schema organizes this vocabulary in a typed class hierarchy. Example (for short, in N3 format, which is a superset of N-Triples; it allows us to define a URI prefix and identify entity URIs wrt a set of prefixes at the beginning of the document) @prefix univ: <http://www.cs.ccsu.edu/~neli/univ.owl> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . univ:professor rdfs:subClassOf univ:staff . univ:staff rdf: type rdfs: Class . univ: professor rdf:type rdfs:Class . univ:Jones rdf:type univ:professor . 13 RDF/RDFS Example teaches Jones Math101 Professor TeachingAss RDF RDFS Staff 14 RDF and RDFS Axiomatic Semantics • All language primitives are represented by constants, such as Resource, Class, Property, subClassOf, Literal, etc. • A few predefined predicates are used to represent relations between constants, such as: – An RDF triple is represented as PropVal (P, R, V), where P is a property, R is a resource, and V is a value; – Predicate Type (R, T) states that resource R has the type T, and it is equivalent to PropVal (type, R, T). All classes are instances of Class and have the type Class, i.e Type (Class, Class), Type (Property, Class), Type (Resource, Class), etc. • Resource is the most general class – every class and every property is a resource. • Predicates in RDF statements are properties. 15 RDF and RDFS Semantics (contd) In RDFS, we also have subclasses , subproperties, and constrains. • subClassOf is a property, i.e. Type(subClassOf, Property). • If class C is a subclass of class C’, then all instances of C are also instances of C’, i.e. PropVal (subClassOf, ?c, ?c’) (Type(?c, Class) & Type(?c’, Class) & ?x (Type (?x, ?c) Type (?x, ?c’))) • Property P is a subproperty of property P’, if P’(x, y) whenever P (x, y), i.e. Type (subPropertyOf, Property) PropVal (subPropertyOf, ?p, ?p’) (Type(?p, Property) & Type(?p’, Property) & ?r ?v (PropVal (?p, ?r, ?v) PropVal (?p’, ?r, ?v))) • Every constraint resource is a resource, i.e. PropVal (subclassOf, ConstraintResourse, Resourse) • Constraint properties are all properties that are also constraint resourses, i.e (Type (?cp, ConstraintProperty) (Type (?cp, ConstraintResource) & Type (?cp, Property)) 16 RDF and RDFS Semantics (contd) • domain and range are constraint properties, i.e. Type (domain, ConstraintProperty) Type (range, ConstraintProperty). • Domain of a property is a set of all object to which P applies, i.e. PropVal (domain, ?p, ?d) ?x ?y (PropVal (?p, ?x, ?y) Type (?x, ?d)) • Range of property P is the set of all values that P can take, i.e. PropVal (range, ?p, ?r) ?x ?y (PropVal (?p, ?x, ?y) Type (?y, ?r)) Given all these axioms, we can derive the following formulas: PropVal (domain, range, Property) PropVal (range, range, Class ) PropVal (domain, domain, Property) PropVal (range, domain, Class) Example. Given PropVal (subClassOf, Professor, Staff), PropVal (domain, teaches, Professor), PropVal (teaches, Jones, Math1) we can derive Type (Jones, Staff). 17 A Direct Inference System for RDF and RDFS • Based on rules of the form If E contains certain triples Then add to E certain triples, where E is a set of RDF triples. • Example rules (from W3C RDF recommendations): If E contains the triple (?x, ?p, ?y) Then E also contains the triple (?p, rdf : type, rdf : property) If E contains the triples (?u, rdfs : subClassOf, ?v) and (?v, rdfs : subClassOf, ?w) Then E also contains the triple (?u, rdfs : subClassOf, ?w) If E contains the triples (?x, rdf : type, ?u) and (?u, rdfs : subClassOf, ?v) Then E also contains the triple (?x, rdf : type, ?v) If E contains the triples (?x, ?p, ?y) and (?p, rdfs : range, ?u) Then E also contains the triple (?y, rdf : type, ?u) 18 How inference in RDF is different from inference in RDFS? Consider the following triples • myUni:Student1 rdf: type myUni: TeachingAssistant . • myUniv: TeachingAssistant rdfs: subClassOf myUniv: Staff . RDF inference will not return an answer to the query to retrieve all staff members, i.e. (?x , rdf : type, myUniv : Staff) because there is no triple matching this pattern. RDFS will return the instances of the TeachingAssistant class using the rule (called the “type propagation” rule) If E contains the triples (?x, rdf : type, ?u) and (?u, rdfs : subClassOf, ?v) Then E also contains the triple (?x, rdf : type, ?v) 19 Types of inferences in SW applications 1. 2. 3. 4. Class membership: if x is an instance of class C, and C is a subclass of D, we want to infer that x is an instance of D. Equivalence of classes: If class A is equivalent to class B, and class B is equivalent of class C, then A is equivalent to C. Classification: if a property-value pair is declared to be a sufficient condition for membership in class A, then if individual x satisfies this condition, x must be an instance of A. Consistency: if x is declared to be an instance of class A where A B C, A D , and B D = , then the ontology is inconsistent because class A must be empty but instead x A. 20 Multiple inheritance and RDFS Consider the rule If E contains the triples (?x, rdf : type, ?u) and (?u, rdfs : subClassOf, ?v) Then E also contains the triple (?x, rdf : type, ?v) Assume the following triples a. “Bob” rdf : type myUniv : TeachingAssistant . b. “Bob” rdf : type myUniv : Student . c. myUniv: TeachingAssistant rdfs : subClassOf myUniv: Staff . From a. and c. and the above rule will can derive “Bob” rdf : type myUni : Staff . The later may be inconsistent with b. if Staff and Student are supposed to be disjoint classes. But disjointness of classes cannot be expressed in RDFS. In RDFS, if ?A is subClassOf ?B, and ?A is subClassOf ?C, then any individual ?x that is a member of ?A will also be a member of ?B and ?C. That is, the range definitions in RDFS are not used to restrict the range of a property, but to infer the membership of the range. 21 What can we deduce in RDFS? In summary, the inference capabilities of RDFS are limited to the following : 1. 2. 3. Given the domain and the range of a property, we can deduce: • Class membership from the domain of a property. • Class membership from the range of a property. Example: given that “Course isTaughtBy Professor” and “Math101 isTaughtBy Jones”, we can derive that Math101 Course, and Jones Professor. Given a class hierarchy, we can deduce superclass membership. Example: given that Professor Staff and Jones Professor, we can derive that Jones Staff. Given a property hierarchy, we can deduce new facts from subproperty relationships. Example: from teachAt emplyedBy and “Jones teachAt CCSU” we can derive that “Jones employedBy CCSU” 22 What cannot we deduce in RDFS? 1. 2. 3. 4. 5. We can’t say that two classes are disjoint, i.e. we can define Student and Staff as subclasses to Person class, but can say that they are disjoint. Property range is defined globally for all classes, we can’t declare range restrictions that apply to some classes only, i.e. exceptions are not allowed. We can’t build Boolean combinations of classes. For example, we may want to declare a new class, person, which is disjoint union of classes male and female. Cardinality restrictions are not allowed. For example, we can’t say that a person has exactly two parents, or a class has exactly one instructor. We can’t declare a property to be inverse of another property, transitive, functional, etc. 23 SPARQL: a query language for RDF and RDFS SPARQL is based on RDF Turtle serialization and basic graph pattern matching algorithm. Example: PREFIX rdf : <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> SELECT ?c WHERE { ?c rdf : type rdfs:Class . } This query retrieves all classes. 24 Using SPARQL 25 What can we do with SPARQL? 1. 2. 3. 4. 5. 6. 7. Extract data as RDF subgraphs, URIs, blank nodes, etc. Explore data via query for unknown relations. Transform RDF data from one vocabulary into another Construct new RDF graphs based on RDF query graphs Update RDF graphs Do logical entailment for RDF, RDFS, and OWL Do federated queries over deferent SPARQL endpoints 26 Basic query 27 … and the result 1. Comes next 28 Another example 29 … and the result 30 Ontologies and the Semantic Web “An ontology is an explicit, formal specification of a shared conceptualization. The term is borrowed from philosophy, where an ontology is a systematic account of existence. For AI systems, what ‘exists’ is that which can be represented.” T. Gruber, 1993. Ontologies are represented via Classes, Relationships, and Instances. Constrains can be imposed on relationships to define allowed values. There are several types of ontologies (see textbook, pp. 462 – 468: Internet shopping world example) • Upper (top-level) ontologies, representing concepts such as space, time, event, etc. that are universally valid. • Domain ontologies, representing concepts in a generic domain. • Task ontologies, representing concepts related to a particular task. • Application ontologies, representing specific application task-oriented domains. 31 An Ontology Example • Visit http://protege.stanford.edu/ to learn about creating ontologies. Source: http://www.sei.cmu.edu/isis/guide/gifs/fruit-ontology.gif 32 OWL: The Web Ontology Language The original OWL language, OWL 1, was intended to provide a richer expressiveness compared to RDFS which is why it was based on SHOIN(D) logic. More expressive power, however, may lead to undesirable computational properties which is why OWL 1 was designed in 3 different flavors to address different knowledge representation needs: 1. 2. 3. OWL Full: fully compatible with RDFS which is further extended with cardinality constraints and other means for maximum expressivity, BUT the language is undecidable. OWL DL: subset of OWL Full to allow for efficient reasoning, but is not fully compatible with RDFS. OWL Lite: subset of OWL DL that does not allow for enumerated classes, disjointness and arbitrary cardinality. The latest version of OWL, OWL 2, is based on SROIQ(D) logic. It also comes in different flavors: OWL EL, OWL RL, OWL QL, which are all subsets of OWL 2 DL, which turn is a subset of OWL 2 Full. OWL is based on the Open World Assumption, which states that the absence of information is not the reason to assume that this information is false. OWL does not rely on the Unique Name Assumption (which is the case with data bases), i.e. if two names are not explicitly stated to be different, they may refer to the same 33 individual. OWL RDF/RDFS relation rdfs: Resource rdfs:Class owl:Class rdf:Property owl:ObjectProperty owl:DatatypeProperty OWL uses RDF syntax. owl:Class, owl:DatatypeProperty, and owl:ObjectProperty are specializations of rdfs:Class and rdf:Property, respectively. 34 OWL 1 Syntax OWL 1 is based on the SHOIN(D) logic, which provides for the following expressiveness: The TBox defines subsumption relationships between classes (ex. C ⊑ D) The ABox contains facts about class membership (ex. C(a), C(b)), properties relations (ex. R(a, b)), equality and difference relations between individuals (ex. a = b, a b) The RBox defines subsumption/inclusion relationships between properties (ex. R ⊑ S), inverse properties (ex. R - ), and transitivity properties (ex. R ⊑ + R) Class constructors: conjunction (C ⊓ D), disjunction (C ⊔ D), negation ( C) Property restrictions – universal ( R . C) and existential ( R . C) Number restrictions -- n R and n R Closed classes (nominals} – {a} Datatypes 35 OWL 2 Syntax OWL 2 is based on the SROIQ(D) logic, which provides for the following additional expressiveness compared to OWL 1: The TBox allows also for equivalence relationships between classes (ex. C D) and for a special class expression Self: S.Self. We can also state n S . C and n S . C The ABox allows also for negated property relations (ex. R(a, b)) The RBox may contain in addition to simple properties, inverse properties (ex. R - ) and universal properties (U) and also allows for general inclusion (ex. R1 R2 ⊑ S), symmetry, reflexivity, irreflexivity and disjunctiveness of properties. There are different syntax versions of OWL 2: Functional syntax (substitutes the abstract syntax of OWL 1) RDF/XML syntax (extends the existing OWL/RDF syntax) OWL/XML syntax (new XML serialization) Manchester syntax (machine readable intended for ontology editors) Turtle syntax (human readable) 36 Want to build an OWL ontology yourself? Although some of OWL serializations are not very hard to use in an application-development setting, there are ontology editor applications that are easy to learn and much more efficient ontology development tools. My #1 choice is PROTÉGÉ, developed at Stanford University and freely available at http://protégé.stanford.edu. It comes is two versions: Web application Desktop application Both come with extensive documentation, including Ontology Development 101: A Guide to Creating Your First Ontology @ http://protegewiki.stanford.edu/wiki/Ontology101 37 n 38 39