9 Ontologies and the Semantic Web Administrative Lab Tutorial: Wednesday, Dec 7, 12:00 - 13:00 in 202 Today: Finish at 5:00 ;-) Background and History Conception Tim Berners-Lee, 1999: I have a dream for the Web (in which computers) become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers. A ’Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ’intelligent agents’ people have touted for ages will finally materialize. Warning Lots of hype surrounding the Semantic Web – important to keep focus on what it can do. Basic Idea: From Documents to Data The World Wide Web: Linked documents • documents are accessible from any location • documents are indexed and searchable • documents can (and usually do) reference one another • documents can be updated dynamically The Semantic Web: Linked Data • data sets are accessible from any location • data sets can be queried and combined • data sets can (and usually do) referenc one another • data sets can be updated with immediate effect 50 RDF: A Data Format for Linked Data Goal A data format that allows everybody to say anything about anything. • applicable to all conceivable forms of data • machine-readable and standardised • platform independent Example: Relational Databases ISBN | Author | Title | Publisher | Pages 001 | Joseph Conrad | Heart of Darkness | Wordsworth | 140 002 | Adam Hochschild | King Leopold’s Ghost | Macmillan | 232 As triples: • Book 001 is written by Joseph Conrad. • Book 001 is titled ‘Heart of Darkness’. • ... RDF data on the Web Different Formats: Text Files • formats: Turtle, RDF/CML, N3 • accessible over standard protocols (mainly HTTP) SPARQL Endpoints • similar to relational databases • queries over standard protocols • returned data is machine-readable Data-Centric Architecture • can be queried using standard protocols • can be linked through URIs • URIs themselves can be referenced 51 More on RDF General Pattern: RDF triples • triples subject, predicate, object • subject and predicate are ‘resources’ • object can be a resource (e.g. Conrad) or a literal (e.g. ’Heart of Darkness’ RDF triples define Graphs • each (s, p, o)-triple defines a labelled edge in a graph • predicates are relations • defines an ALC-ABox. Anatomy of Resources Triples (from W3C RFC) An RDF triple contains three components: • the subject, which is an RDF URI reference or a blank node • the predicate, which is an RDF URI reference • the object, which is an RDF URI reference, a literal or a blank node Remark • central theme: resources live in global namespace • public resources are URIs with optional fragment identifier (#) • local resources can be expressed using blank nodes • object can be a resource or a literal RDF Example Abstract Example ⌫ ⇣⌘ John ✓◆ knows ⌫ ⇣⌘ John ✓◆ name ⌫ / P at ⇣⌘ ✓◆ / "John Smith" Concrete Example (in N3 syntax) • properties ’knows’ and ’name’ are already defined • import namespace to make references shorter @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://example.org/John> foaf:knows <http://example.org/Pat> . <http://example.org/John> foaf:name "John Smith" . 52 Almost Everything is an URI Why URIs? • URIs live in a namespace controlled by the creator • simple rule of thumb: only use your own namespace! • minimal risk of overlapping names • established reference mechanism Caveat • URI model does not prevent name clashes • an URI is just a string, but doesn’t have an intrinsic meaning RDF Graphs are Role Assertions Translation: Blank Nodes and Literals (s, p, o) � (s, o) ∶ p Literal Values • used to describe ’atomic’ information about nodes • can be typed: String, Integer, . . . • cannot be de-referenced! • (Formally: an extension of ALC with data types) Blank Nodes • Blank nodes live in a local namespace • they represent nodes that aren’t globally visible • blank nodes assert the existence of an object • they don’t reveal the identity of this object • blank nodes may not be identified when merging graphs! 53 Blank Nodes: Example Alice knows a person • people:Alice foaf:knows _.x • _:x rdf:type foaf:person Alice owns a car • people:Alice example:owns _.x • _:x rdf:type example.car Problem of Careless Merging Taking union of both graphs gives uninteded results: ‘There is a thing (x) which is both a person and a car, which (whom?) Alice both knows and owns’. RDF Vocabularies Vocabularies An RDF-Vocabulary is just a set of URIs (used to structure a set of names) Example 75. RDF itself is an RDF vocabulary and contains • rdf:type – used for typing resources • rdf:Property – distingishes relations from objects Example 76. Other RDF vocabularies: • RDFS (RDF Schema), defines ’Resource’, ’Class’, ’Property’ • FOAF (fried of a friend), defining ’knows’, ’Person’, . . . Multiple Datasets: Merging Datasets can be merged • Graphs from different sources can be merged • Nodes with the same URL are considered identical • Unlabelled nodes are kept separate • No limitations on graphs that can be merged • Any RDF can be merged with any other RDF Remarks • ‘nothing that has been said can be unsaid’ • merging is safe: creates larger graphs 54 Merging and Blank Nodes Merging is Simple • merging of RDF graphs is just taking their union • identical URIs are identified • blank nodes from different graphs may not be identified Merging Mechanism 1. ensure that blank nodes in different data sets have different names 2. take the union of graphs and collapse nodes with the same names So Far: Untyped Graphs RDF Schema • specifies typing for resources • provides information about properties • (ontologies in the small) Uses for RDF Schemas • define URIs for new properties • define URIs for new classes • define subclass (sub-property) relationship • define ranges and domains of properties FOAF example Example 77. From the FOAF definition (in XML notation) <rdfs:Class rdf:about="http://xmlns.com/foaf/0.1/Person" rdfs:label="Person" rdfs:comment="A person." vs:term_status="stable"> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/> .... <owl:disjointWith rdf:resource="http://xmlns.com/foaf/0.1/Organization"/> <owl:disjointWith rdf:resource="http://xmlns.com/foaf/0.1/Project"/> </rdfs:Class> (Note the disjointness and typing information) 55 OWL: The Web Ontology Language OWL is ALC in XML-Notation • more powerful than RDFS • used to define classes and their relationshipips • machine-readable XML-format • several versions: OWL-Lite, OWL-DL and OWL-Full • see http://www.w3.org/TR/owl-features/ Example Example 78. The Pizza Ontology in OWL: <owl:ObjectProperty rdf:about="#hasBase"> <rdfs:domain rdf:resource="#Pizza"/> <rdfs:range rdf:resource="#PizzaBase"/> </owl:ObjectProperty> (Note that domain and range are expressible in rdfs) <owl:Class rdf:about="#Pizza"> <rdfs:subClassOf> <owl:Restriction> <owl:onProperty rdf:resource="#hasBase"/> <owl:someValuesFrom rdf:resource="#PizzaBase"/> </owl:Restriction> ... </owl:class> Ontologies and RDF Graphs Main Purpose • Ontology (or RDF Schema) provides vocabulary (types) • Ontologies can be merged – need to maintain consistency! • RDF data populates ontologies – need to perform inferencing! Main Application: Querying Distribute Data Ingredients: • one or more ontologies (describing the vocabulary) • one or more data sets (specifying the data) Type of Query: • often combines data from different graphs • e.g. ‘all vegetarian pizzas’ queried on data from two delivery services � All data sets must be formulated using common vocabulary � need to ‘know’ that e.g. Margheritas are Vegetarian properties’ 56 SPARQL Queries Anatomy of a SPARQL Query • Prefix declarations, for abbreviating URIs • Dataset definition, stating what RDF graph(s) are being queried • A result clause, identifying what information to return from the query • The query pattern, specifying what to query for in the underlying dataset • Query modifiers, slicing, ordering, and otherwise rearranging query results SPARQL Queries Concrete Structure # prefix declarations PREFIX foo: <http://example.com/resources/> ... # dataset definition FROM ... # result clause SELECT ... # query pattern WHERE { ... } # query modifiers ORDER BY ... SPARQL Context RDF Data • queries executed against RDF datasets (that define a relational structure) • can combine more than one data source • links to other data sources can be followed SPARQL endpoints • generic endpoint: queries against any web-accessible data • specific endpoint: hardwired against particular data set Returned Data • XML. SPARQL specifies an XML vocabulary for returning tables of results. • other formats possible: JSON, RDF, HTML 57 Examples Data Set (in N3 Syntax) @prefix @prefix @prefix @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . rdfs: <http://www.w3.org/2000/01/rdf-schema#>. foaf: <http://xmlns.com/foaf/0.1/> . people: <http://www.doc.ic.ac.uk/~dirk/people#> . # Dirk Pattinson people:dirk rdf:type foaf:Person . people:dirk foaf:name "Dirk Pattinson" . people:dirk foaf:title "Dr" . people:dirk foaf:firstName "Dirk" . people:dirk foaf:surname "Pattinson" . people:dirk foaf:mbox "dirk@imperial.ac.uk" . people:dirk foaf:based_near <http://dbpedia.org/resource/London> . people:dirk foaf:knows _:john . people:dirk rdfs:seeAlso <http://www.doc.ic.ac.uk/~dirk/more.n3> . # John Doe (a blank node) _:john a foaf:Person . _:john foaf:name "John Doe". Examples Data Set (in XML Syntax) <?xml version="1.0"?> <rdf:RDF xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:people="http://www.doc.ic.ac.uk/~dirk/people#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <foaf:Person rdf:about="http://www.doc.ic.ac.uk/~dirk/people#dirk"> <foaf:name>Dirk Pattinson</foaf:name> <foaf:title>Dr</foaf:title> <foaf:firstName>Dirk</foaf:firstName> <foaf:surname>Pattinson</foaf:surname> <foaf:mbox>dirk@imperial.ac.uk</foaf:mbox> <foaf:based_near rdf:resource="http://dbpedia.org/resource/London" /> <foaf:knows> <foaf:Person> <foaf:name>John Doe</foaf:name> </foaf:Person> </foaf:knows> <rdfs:seeAlso rdf:resource="http://www.doc.ic.ac.uk/~dirk/more.n3" /> 58 </foaf:Person> </rdf:RDF> Examples Data Set (underlying Triples) <http://www.doc.ic.ac.uk/~dirk/people#dirk> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>. <http://www.doc.ic.ac.uk/~dirk/people#dirk> <http://xmlns.com/foaf/0.1/name> "Dirk Pattinson". [ some triples omitted ] <http://www.doc.ic.ac.uk/~dirk/people#dirk> <http://xmlns.com/foaf/0.1/based_near> <http://dbpedia.org/resource/London>. <http://www.doc.ic.ac.uk/~dirk/people#dirk> <http://xmlns.com/foaf/0.1/knows> _:john. _:john <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person>. _:john <http://xmlns.com/foaf/0.1/name> "John Doe". Example Queries Simple Search PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox . } Notes • variables are denoted with ? • matches data in the RDF graph against variables • returns all matches • formally: all individuals that are instances of the given concept 59 Semantics of Matching In Relation to ALC • ontology (or RDF Schema) specifies a TBox • triples (s, p, o) are translated to concepts • the set of matches is given by the set of all instances Example 79. Matching in a WHERE-clause, e.g.: ?x foaf:name ?name . ALC-Semantics: • x and name are individuals • as a concept assertion: x ∶ ∃name.?name • matches (x, name) where x is an instance of ∃foaf ∶ name.name. • instance checking performed over given ontology. More Example Queries More Constraints: near London PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?mbox WHERE { ?x foaf:name ?name . ?x foaf:mbox ?mbox . ?x foaf:based_near <http://dbpedia.org/resource/London> } Remark • just as before, but with additional constraint • note explicit URI in query Federated Queries People based near Cities with name • The name (London) is a literal, not an URI • Use dbpedia sparql endpoint in sub-query to find URI Query Example 60 PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX dbterm: <http://dbpedia.org/property/> SELECT ?name ?mbox WHERE { SERVICE <http://dbpedia.org/sparql> { ?loc dbterm:name "London"@en } ?x foaf:name ?name . ?x foaf:mbox ?mbox . ?x foaf:based_near ?loc . } Optional Components Example Query PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?age WHERE { ?x foaf:name ?name . OPTIONAL { ?x foaf:age ?age . } } • returns information marked optional if available Query Modifiers Example Query PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name ?age WHERE { ?x foaf:name ?name . OPTIONAL { ?x foaf:age ?age . } } LIMIT 20 • only returns 20 results More Query Modifiers • e.g. ORDER BY ?name – specifies the ordering of results by name • e.g. OFFSET 20 – only return the 21st and successive matched 61 References and Further Reading: RDF RDF We have only described RDF very briefly. • W3C RDF Specification: http://www.w3.org/TR/REC-rdf-syntax/ • a tutorial (beware: lots of advertising) http://www.w3schools.com/rdf/default. asp References and Further Reading: SPARQL SPARQL • W3C specification: http://www.w3.org/TR/rdf-sparql-query/ • a tutorial: http://eneumann.org/talks/Sparql_tutorial.html • Cambridge Semantics Tutorial: http://www.cambridgesemantics.com/2008/09/sparql-by-examp • Jena SPARQL tutorial: http://jena.sourceforge.net/ARQ/Tutorial/ 62