9 Ontologies and the Semantic Web

advertisement
9
Ontologies and the Semantic Web
Administrative
Lab Tutorial: Wednesday, Dec 7, 12:00 - 13:00 in 202
Today: Finish at 5:00 ;-)
Background and History
Conception
Tim Berners-Lee, 1999:
I have a dream for the Web (in which computers) become capable of analyzing all the
data on the Web – the content, links, and transactions between people and computers. A
’Semantic Web’, which should make this possible, has yet to emerge, but when it does,
the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by
machines talking to machines. The ’intelligent agents’ people have touted for ages will
finally materialize.
Warning
Lots of hype surrounding the Semantic Web – important to keep focus on what it can do.
Basic Idea: From Documents to Data
The World Wide Web: Linked documents
• documents are accessible from any location
• documents are indexed and searchable
• documents can (and usually do) reference one another
• documents can be updated dynamically
The Semantic Web: Linked Data
• data sets are accessible from any location
• data sets can be queried and combined
• data sets can (and usually do) referenc one another
• data sets can be updated with immediate effect
50
RDF: A Data Format for Linked Data
Goal
A data format that allows everybody to say anything about anything.
• applicable to all conceivable forms of data
• machine-readable and standardised
• platform independent
Example: Relational Databases
ISBN | Author
| Title
| Publisher | Pages
001 | Joseph Conrad
| Heart of Darkness
| Wordsworth | 140
002 | Adam Hochschild | King Leopold’s Ghost | Macmillan | 232
As triples:
• Book 001 is written by Joseph Conrad.
• Book 001 is titled ‘Heart of Darkness’.
• ...
RDF data on the Web
Different Formats:
Text Files
• formats: Turtle, RDF/CML, N3
• accessible over standard protocols (mainly HTTP)
SPARQL Endpoints
• similar to relational databases
• queries over standard protocols
• returned data is machine-readable
Data-Centric Architecture
• can be queried using standard protocols
• can be linked through URIs
• URIs themselves can be referenced
51
More on RDF
General Pattern: RDF triples
• triples subject, predicate, object
• subject and predicate are ‘resources’
• object can be a resource (e.g. Conrad) or a literal (e.g. ’Heart of Darkness’
RDF triples define Graphs
• each (s, p, o)-triple defines a labelled edge in a graph
• predicates are relations
• defines an ALC-ABox.
Anatomy of Resources
Triples (from W3C RFC)
An RDF triple contains three components:
• the subject, which is an RDF URI reference or a blank node
• the predicate, which is an RDF URI reference
• the object, which is an RDF URI reference, a literal or a blank node
Remark
• central theme: resources live in global namespace
• public resources are URIs with optional fragment identifier (#)
• local resources can be expressed using blank nodes
• object can be a resource or a literal
RDF Example
Abstract Example
⌫

⇣⌘ John ✓◆
knows
⌫

⇣⌘ John ✓◆
name
⌫

/ P at
⇣⌘
✓◆
/ "John Smith"
Concrete Example (in N3 syntax)
• properties ’knows’ and ’name’ are already defined
• import namespace to make references shorter
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://example.org/John> foaf:knows <http://example.org/Pat> .
<http://example.org/John> foaf:name "John Smith" .
52
Almost Everything is an URI
Why URIs?
• URIs live in a namespace controlled by the creator
• simple rule of thumb: only use your own namespace!
• minimal risk of overlapping names
• established reference mechanism
Caveat
• URI model does not prevent name clashes
• an URI is just a string, but doesn’t have an intrinsic meaning
RDF Graphs are Role Assertions
Translation:
Blank Nodes and Literals
(s, p, o) � (s, o) ∶ p
Literal Values
• used to describe ’atomic’ information about nodes
• can be typed: String, Integer, . . .
• cannot be de-referenced!
• (Formally: an extension of ALC with data types)
Blank Nodes
• Blank nodes live in a local namespace
• they represent nodes that aren’t globally visible
• blank nodes assert the existence of an object
• they don’t reveal the identity of this object
• blank nodes may not be identified when merging graphs!
53
Blank Nodes: Example
Alice knows a person
• people:Alice foaf:knows _.x
• _:x rdf:type foaf:person
Alice owns a car
• people:Alice example:owns _.x
• _:x rdf:type example.car
Problem of Careless Merging
Taking union of both graphs gives uninteded results:
‘There is a thing (x) which is both a person and a car, which (whom?) Alice both knows and owns’.
RDF Vocabularies
Vocabularies
An RDF-Vocabulary is just a set of URIs (used to structure a set of names)
Example 75. RDF itself is an RDF vocabulary and contains
• rdf:type – used for typing resources
• rdf:Property – distingishes relations from objects
Example 76. Other RDF vocabularies:
• RDFS (RDF Schema), defines ’Resource’, ’Class’, ’Property’
• FOAF (fried of a friend), defining ’knows’, ’Person’, . . .
Multiple Datasets: Merging
Datasets can be merged
• Graphs from different sources can be merged
• Nodes with the same URL are considered identical
• Unlabelled nodes are kept separate
• No limitations on graphs that can be merged
• Any RDF can be merged with any other RDF
Remarks
• ‘nothing that has been said can be unsaid’
• merging is safe: creates larger graphs
54
Merging and Blank Nodes
Merging is Simple
• merging of RDF graphs is just taking their union
• identical URIs are identified
• blank nodes from different graphs may not be identified
Merging Mechanism
1. ensure that blank nodes in different data sets have different names
2. take the union of graphs and collapse nodes with the same names
So Far: Untyped Graphs
RDF Schema
• specifies typing for resources
• provides information about properties
• (ontologies in the small)
Uses for RDF Schemas
• define URIs for new properties
• define URIs for new classes
• define subclass (sub-property) relationship
• define ranges and domains of properties
FOAF example
Example 77. From the FOAF definition (in XML notation)
<rdfs:Class rdf:about="http://xmlns.com/foaf/0.1/Person"
rdfs:label="Person" rdfs:comment="A person." vs:term_status="stable">
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
....
<owl:disjointWith
rdf:resource="http://xmlns.com/foaf/0.1/Organization"/>
<owl:disjointWith
rdf:resource="http://xmlns.com/foaf/0.1/Project"/>
</rdfs:Class>
(Note the disjointness and typing information)
55
OWL: The Web Ontology Language
OWL is ALC in XML-Notation
• more powerful than RDFS
• used to define classes and their relationshipips
• machine-readable XML-format
• several versions: OWL-Lite, OWL-DL and OWL-Full
• see http://www.w3.org/TR/owl-features/
Example
Example 78. The Pizza Ontology in OWL:
<owl:ObjectProperty rdf:about="#hasBase">
<rdfs:domain rdf:resource="#Pizza"/>
<rdfs:range rdf:resource="#PizzaBase"/>
</owl:ObjectProperty>
(Note that domain and range are expressible in rdfs)
<owl:Class rdf:about="#Pizza">
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="#hasBase"/>
<owl:someValuesFrom rdf:resource="#PizzaBase"/>
</owl:Restriction>
...
</owl:class>
Ontologies and RDF Graphs
Main Purpose
• Ontology (or RDF Schema) provides vocabulary (types)
• Ontologies can be merged – need to maintain consistency!
• RDF data populates ontologies – need to perform inferencing!
Main Application: Querying Distribute Data
Ingredients:
• one or more ontologies (describing the vocabulary)
• one or more data sets (specifying the data)
Type of Query:
• often combines data from different graphs
• e.g. ‘all vegetarian pizzas’ queried on data from two delivery services
� All data sets must be formulated using common vocabulary
� need to ‘know’ that e.g. Margheritas are Vegetarian properties’
56
SPARQL Queries
Anatomy of a SPARQL Query
• Prefix declarations, for abbreviating URIs
• Dataset definition, stating what RDF graph(s) are being queried
• A result clause, identifying what information to return from the query
• The query pattern, specifying what to query for in the underlying dataset
• Query modifiers, slicing, ordering, and otherwise rearranging query results
SPARQL Queries
Concrete Structure
# prefix declarations
PREFIX foo: <http://example.com/resources/>
...
# dataset definition
FROM ...
# result clause
SELECT ...
# query pattern
WHERE {
...
}
# query modifiers
ORDER BY ...
SPARQL Context
RDF Data
• queries executed against RDF datasets (that define a relational structure)
• can combine more than one data source
• links to other data sources can be followed
SPARQL endpoints
• generic endpoint: queries against any web-accessible data
• specific endpoint: hardwired against particular data set
Returned Data
• XML. SPARQL specifies an XML vocabulary for returning tables of results.
• other formats possible: JSON, RDF, HTML
57
Examples
Data Set (in N3 Syntax)
@prefix
@prefix
@prefix
@prefix
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
rdfs: <http://www.w3.org/2000/01/rdf-schema#>.
foaf: <http://xmlns.com/foaf/0.1/> .
people: <http://www.doc.ic.ac.uk/~dirk/people#> .
# Dirk Pattinson
people:dirk rdf:type foaf:Person .
people:dirk foaf:name "Dirk Pattinson" .
people:dirk foaf:title "Dr" .
people:dirk foaf:firstName "Dirk" .
people:dirk foaf:surname "Pattinson" .
people:dirk foaf:mbox "dirk@imperial.ac.uk" .
people:dirk foaf:based_near <http://dbpedia.org/resource/London> .
people:dirk foaf:knows _:john .
people:dirk rdfs:seeAlso <http://www.doc.ic.ac.uk/~dirk/more.n3> .
# John Doe (a blank node)
_:john a foaf:Person .
_:john foaf:name "John Doe".
Examples
Data Set (in XML Syntax)
<?xml version="1.0"?>
<rdf:RDF
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:people="http://www.doc.ic.ac.uk/~dirk/people#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<foaf:Person rdf:about="http://www.doc.ic.ac.uk/~dirk/people#dirk">
<foaf:name>Dirk Pattinson</foaf:name>
<foaf:title>Dr</foaf:title>
<foaf:firstName>Dirk</foaf:firstName>
<foaf:surname>Pattinson</foaf:surname>
<foaf:mbox>dirk@imperial.ac.uk</foaf:mbox>
<foaf:based_near rdf:resource="http://dbpedia.org/resource/London" />
<foaf:knows>
<foaf:Person>
<foaf:name>John Doe</foaf:name>
</foaf:Person>
</foaf:knows>
<rdfs:seeAlso rdf:resource="http://www.doc.ic.ac.uk/~dirk/more.n3" />
58
</foaf:Person>
</rdf:RDF>
Examples
Data Set (underlying Triples)
<http://www.doc.ic.ac.uk/~dirk/people#dirk>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Person>.
<http://www.doc.ic.ac.uk/~dirk/people#dirk>
<http://xmlns.com/foaf/0.1/name>
"Dirk Pattinson".
[ some triples omitted ]
<http://www.doc.ic.ac.uk/~dirk/people#dirk>
<http://xmlns.com/foaf/0.1/based_near>
<http://dbpedia.org/resource/London>.
<http://www.doc.ic.ac.uk/~dirk/people#dirk>
<http://xmlns.com/foaf/0.1/knows>
_:john.
_:john
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Person>.
_:john
<http://xmlns.com/foaf/0.1/name>
"John Doe".
Example Queries
Simple Search
PREFIX foaf:
<http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE {
?x foaf:name ?name .
?x foaf:mbox ?mbox .
}
Notes
• variables are denoted with ?
• matches data in the RDF graph against variables
• returns all matches
• formally: all individuals that are instances of the given concept
59
Semantics of Matching
In Relation to ALC
• ontology (or RDF Schema) specifies a TBox
• triples (s, p, o) are translated to concepts
• the set of matches is given by the set of all instances
Example 79. Matching in a WHERE-clause, e.g.:
?x foaf:name ?name .
ALC-Semantics:
• x and name are individuals
• as a concept assertion: x ∶ ∃name.?name
• matches (x, name) where x is an instance of ∃foaf ∶ name.name.
• instance checking performed over given ontology.
More Example Queries
More Constraints: near London
PREFIX foaf:
<http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE {
?x foaf:name ?name .
?x foaf:mbox ?mbox .
?x foaf:based_near <http://dbpedia.org/resource/London>
}
Remark
• just as before, but with additional constraint
• note explicit URI in query
Federated Queries
People based near Cities with name
• The name (London) is a literal, not an URI
• Use dbpedia sparql endpoint in sub-query to find URI
Query Example
60
PREFIX foaf:
<http://xmlns.com/foaf/0.1/>
PREFIX dbterm: <http://dbpedia.org/property/>
SELECT ?name ?mbox
WHERE {
SERVICE <http://dbpedia.org/sparql> {
?loc dbterm:name "London"@en
}
?x foaf:name ?name .
?x foaf:mbox ?mbox .
?x foaf:based_near ?loc .
}
Optional Components
Example Query
PREFIX foaf:
<http://xmlns.com/foaf/0.1/>
SELECT ?name ?age
WHERE {
?x foaf:name ?name .
OPTIONAL { ?x foaf:age ?age . }
}
• returns information marked optional if available
Query Modifiers
Example Query
PREFIX foaf:
<http://xmlns.com/foaf/0.1/>
SELECT ?name ?age
WHERE {
?x foaf:name ?name .
OPTIONAL { ?x foaf:age ?age . }
}
LIMIT 20
• only returns 20 results
More Query Modifiers
• e.g. ORDER BY ?name – specifies the ordering of results by name
• e.g. OFFSET 20 – only return the 21st and successive matched
61
References and Further Reading: RDF
RDF
We have only described RDF very briefly.
• W3C RDF Specification: http://www.w3.org/TR/REC-rdf-syntax/
• a tutorial (beware: lots of advertising) http://www.w3schools.com/rdf/default.
asp
References and Further Reading: SPARQL
SPARQL
• W3C specification: http://www.w3.org/TR/rdf-sparql-query/
• a tutorial: http://eneumann.org/talks/Sparql_tutorial.html
• Cambridge Semantics Tutorial: http://www.cambridgesemantics.com/2008/09/sparql-by-examp
• Jena SPARQL tutorial: http://jena.sourceforge.net/ARQ/Tutorial/
62
Download