Publishing to the Semantic Web

advertisement
Publishing to the Semantic Web
Dr Owen Conlan
Dr Alexander O’Connor
The Long Road for Semantic Web
The World
Wide Web
• HTTP
• HTML
1992
The Semantic
Web Vision
• RDF
• Semantic
Web Stack
1998/9
Web 2.0 / The
Social Web
• Tagging
• Crowd
Sourcing
2003
The Web of
Data (Linked
Data)
• VoID
• DBpedia
2006
Semantic Web
• Reasoning
• Logic, Rules
• Trust
????
The Linked Data Movement
• Tim Berners Lee driven
• “Linked Data uses a small slice of the technologies that
make up the Semantic Web”
• Treat Schemas as Vocabularies
• Reuse existing schemas
Linking Open Data Project
• Community project with W3C support started in
early 2007
• Idea: take existing (open) data sets and make them
available on the Web in RDF
• Interlink them with other data sets
A Pretty (Scary) Diagram
DBpedia
• Transforming Wikipedia into
a knowledge base
• Structure from
•
•
•
•
Infoboxes
HTML (titles)
Categories Links
other languages, redirects,
disambiguations, etc
Check out :
http://dbpedia.org/page/Trinity_Coll
ege,_Dublin
• Uses: as a controlled vocab,
as an ontology
(or Google “Trinity College Dublin
dbpedia”)
Linking Data
• Publish structured data in RDF on the web using
URIs and shared vocabularies rather than the
traditional Semantic Web focus on ontologies and
inference
• Lowers barriers to entry
• Fosters widespread adoption
• Mature tools, techniques, patterns
Linked Data Principles
• Formulated by Tim Berners-Lee (2006):
1. Use URIs as names for things
2. Use HTTP URIs so that people/apps can lookup these
names
3. When someone/an app looks up a URI, provide useful
information
4. Include links to other URIs so that they can discover
more things
• This not an unambiguous specification, just a set of
principles.
http://what.is.http.org/ask?question
• The Hypertext Transfer Protocol (HTTP) is an
application-level protocol for distributed,
collaborative, hypermedia information systems. It is
a generic, stateless, protocol which can be used for
many tasks beyond its use for hypertext, such as
name servers and distributed object management
systems, through extension of its request methods,
error codes and headers. A feature of HTTP is the
typing and negotiation of data representation,
allowing systems to be built independently of the
data being transferred. [RFC2616]
URIs
• A Uniform Resource Identifier (URI) is a compact
sequence of characters that identifies an abstract or
physical resource [RFC3986]
• Syntax: URI = scheme “:” hier-part [“?” query] [“#”
fragment]
• Example
Note: scheme not the same as protocol
Identifying Linked Data Resources
• Linked data needs dereferenceable URIs (ones we
can use HTTP to retrieve a description of that
resource)
• But we cannot serialise people, things over the
internet (yet?) => we publish RDF documents on the
web that describe them
• A real-word object != a document about that object
• e.g. creation-date for you != creation-date for your webpage
Identifying Linked Data Resources
• URI that identifies a real-word object != URI
that identifies a document about that object
• Can make statements about object and can make
statements about the document describing it
How do we link these 2 URIs together?
URI styles for Linked Data
• 303 Redirect (e.g. http://example.uk/people/davesmith)
• Used for large, dynamic data sets
• Flexible because redirection can be separately configured
for each resource
• e.g. can store data in multiple files or DB. Can change this
at deployment/run-time.
• Typically used for resource descriptions in large data-sets
URI styles for Linked Data
• Fragment (e.g. http://example.uk/people#davesmith)
•
•
•
•
•
•
Used for small, static data sets
Reduced number of HTTP round-trips => reduced latency
A single HTTP request retrieves the entire document
May transmit unnecessary data across the web
Used for RDFa (defined via RDFa “about=” attribute)
Typically used for vocabulary definitions
303 Redirect Approach
1. Create URIs for concept/thing and documents
•
•
•
e.g. http://biglynx.co.uk/people/dave-smith (URI identifying the
person Dave Smith)
http://biglynx.co.uk/people/dave-smith.rdf (URI for RDF/XML
document describingDave Smith)
http://biglynx.co.uk/people/dave-smith.html(URI for HTML
document describing Dave Smith)
2. Use HTTP redirects/content negotiation to access the
desired resource description for the specific user agent
1.
2.
3.
4.
Client HTTP GET request on a URI identifying a object
Server recognizes URI, it answers using the HTTP 303 to send the
URI of a description of the object
Client HTTP GET request on new URI
Server sends document from new URI
Huh?
• The picture below shows how dereferencing a HTTP
URI identifying a non-information resource plays
together with content negotiation:
• Simples…
Fragment Approach
1. Assign a URI to the RDF document defining the concepts
•
e.g. http://biglynx.co.uk/vocab/sme/ (document URI)
2. Assign fragment identifiers to concepts within the
document
•
•
e.g. http://biglynx.co.uk/vocab/sme#SmallMediumEnterprise
http://biglynx.co.uk/vocab/sme#Team
3. Use HTTP requests to get the description
1.
2.
3.
4.
Client truncates a fragment URI to just refer to the document
Send HTTP GET to request the document
Server sends back the full document
Linked data application now inspects triples to find fragment
Now we can refer to stuff
• Class
<!-- http://www.pizza.com/ontologies/pizza.owl#ThinAndCrispyBase 
<owl:Class rdf:about="&pizza;ThinAndCrispyBase">
<rdfs:subClassOf rdf:resource="&pizza;PizzaBase"/>
</owl:Class>
• Property
<!-- http://www.pizza.com/ontologies/pizza.owl#hasIngredient -->
<owl:ObjectProperty rdf:about="&pizza;hasIngredient">
<rdf:type rdf:resource="&owl;TransitiveProperty"/>
<owl:inverseOf rdf:resource="&pizza;isIngredientOf"/>
</owl:ObjectProperty>
5 Steps to Publishing Linked Data
1.
2.
3.
4.
5.
Understand the Principles
Understand your Data
Choose URIs for Things in your Data
Set up Your Infrastructure
Link to other Data Sets
Step 1: Understanding the Principles
A. Use URIs as names for things
•
•
•
Anything, not just documents
You are not your homepage
Information resources (can be transmitted electronically) and noninformation resources (cannot be transmitted electronically, e.g. a
person!)
B. Use HTTP URIs
•
•
Globally unique names, distributed ownership
Allows people to lookup those names
Step 1: Understanding the Principles
(cont.)
C. Provide useful information in RDF when someone looks up a
URI
•
We can include RDF triple statements!
D. Include RDF links to other URIs
•
•
•
•
To enable discovery of related information e.g. via “follow your
nose” browsing
Relationship Links – to add context
Identity Links – for URI aliases in other sources
Vocabulary Links – to enable self-description
Step 2: Understand your Data
• What are the key things in your data?
•
•
•
•
•
•
•
People
Places
Events
Book
Flims
Musician
…
• This why domain expertise are critically important
Step 2: Understand your Data (cont.)
• What vocabularies can be used to describe these?
• Principles:
•
•
Reuse, don’t reinvent
Mix liberally
• Examples:
•
•
•
foaf -- Friend-of-a-Friend ontology
geonames -- GeoNames ontology
skos -- Simple Knowledge Organization System
• ckan.net
Step 2: Common Vocabularies
•
•
•
•
•
•
•
•
•
•
•
•
bibo -- Bibilographic ontology
cc -- Creative Commons ontology
damltime -- Time Zone ontology
doap -- Description of a Project
ontology
event -- Event ontolog
foaf -- Friend-of-a-Friend ontology
frbr -- Functional Requirements for
Bibliographic Records
geo -- Geo wgs84 ontology
geonames -- GeoNames ontology
mo -- Music Ontology
opencyc -- OpenCyc knowledge base
owl -- Web Ontology Language
•
•
•
•
•
•
•
•
•
pim_contact -- PIM (personal
information management) Contacts
ontology
po -- Programmes Ontology (BBC)
rss -- Really Simple Syndicate (1.0)
ontology
sioc -- Socially Interlinked Online
Communities ontolog
sioc_types -- SIOC extension
skos -- Simple Knowledge
Organization System
umbel -- Upper Mapping and
Binding Exchange Layer ontology
wordnet -- WordNet lexical
ontology
yandex_foaf -- FOAF (Friend-of-aFriend) Yandex extension ontology
Step 3: Choosing URIs
• Use HTTP URIs Keep out of other people’s namespaces
•
Create own URI and include alias information
• Abstract away from implementation details:
•
http://dbpedia.org/resource/Berlin
• Is better than this:
•
http://www4.wiwisss.fu-berlin.de:2020/demos/dbpedia/cgibin/resource.php?id=/Berlin
• Use Natural Keys within URIs:
•
•
•
Need to ensure the uniqueness of URIs
Useful to base them on some existing primary key
Whenever possible, use a key that is meaningful within the domain of
the data set. e.g. use the ISBN as part of the URI of a book
Step 3: Choosing URIs (cont.)
• Common patterns for URIs:
•
•
•
http://dbpedia.org/resource/Berlin  Thing
http://dbpedia.org/data/Berlin  RDF
http://dbpedia.org/page/Berlin  HTML
• Or use the file name extension:
•
•
•
http://biglynx.co.uk/people/dave-smith
http://biglynx.co.uk/people/dave-smith.html
http://biglynx.co.uk/people/dave-smith.rdf
Step 4: Set up Your Infrastructure
• Describe the Data-set!
•
e.g. dataset name, authorship, updates, licensing terms, crawler
support, SPARQL endpoint location, ...
• Vocabulary of Interlinked Datasets (VoID)
•
A little later…
• Pick a Publication Pattern
•
•
•
Is your input data: queryable, structured or text?
What is the data volume?
Is it static or dynamic?
• Test it
Step 4: Set up Your Infrastructure
(cont.)
Step 5: Linking
• Popular predicates for linking
•
•
•
•
•
•
•
•
•
owl:sameAs
Foaf:depection
Foaf:homepage
Foaf:topic
Foaf:based_near
Foaf:maker/foaf:made
Foaf:page
Foaf:primaryTopic
Rdfs:seeAlso
Step 5: Linking (cont.)
•
•
VoID (from "Vocabulary of Interlinked Datasets") is an RDF based schema
to describe linked datasets
A dataset is a collection of data, published and maintained by a single
provider, available as RDF, and accessible, for example, through
dereferenceable HTTP URIs or a SPARQL endpoint
http://semanticweb.org/wiki/VoiD
1.
2.
3.
4.
5.
Understand the Principles
Understand your Data
Choose URIs for Things in your Data
Set up Your Infrastructure
Link to other Data Sets
Thank you!
Owen.Conlan@scss.tcd.ie
References
1.
2.
3.
4.
5.
http://linkeddata.org
Debugging Semantic Web sites with cURL,
http://dowhatimean.net/2007/02/debugging-semantic-web- sites-withcurl
Linked Data Tutorial,
http://www.slideshare.net/mediasemanticweb/linked-data- michaelhausenblas-2009-03-05
Linked Data Applications, M Hausenblas, DERI Technical Report 2009
Linked Data: Evolving the Web into a Global Data Space, Tom Heath ,
Christian Bizer http://linkeddatabook.com
Download