The Semantic Web - Bosatsu Consulting

advertisement
The Semantic Web
Brian Sletten
! @bsletten
04/02/2014
Speaker Qualifications
· Specialize in next-generation technologies
· Author of "Resource-Oriented Architecture Patterns for Webs of Data"
· One of Top 100 Semantic Web People
3/147
Agenda
· Introduction
· REST
· RDF
· SPARQL
· RDFa
· R2RML
· Linked Data
4/147
Introduction
Where We Do Data Integration
· Databases
· Code
· Services
· Data Warehouses
6/147
“ If we had individual words to represent every
particularity we would have to have an infinite
number of them, which would exceed our
capability of learning, recalling and
manipulating them. ”
Daniel Chandler
“ [i]f words had the job of representing concepts
fixed in advance, one would be able to find
exact equivalents for them as between one
language and another. But this is not the case. ”
Ferdinand de Saussure
“ backpfeifengesicht ”
10/147
“ There are no 'natural' concepts or categories
which are simply 'reflected' in language.
Language plays a crucial role in 'constructing
reality'. ”
Daniel Chandler
12/147
http://www.flickr.com/photos/75467759@N00/4880012589
http://www.flickr.com/photos/iamart3/478545143
http://www.flickr.com/photos/68901280@N00/5250299238
REST
17/147
18/147
19/147
20/147
http://amundsen.com/media-types/collection/
21/147
application/vnd.collections+json
http://amundsen.com/blog/
22/147
Collections+JSON
JSON
{ "collection" :
{
"version" : "1.0",
"href" : "http://example.org/friends/"
}
}
23/147
Collections+JSON
JSON
{ "collection" :
{
"version" : "1.0",
"href" : "http://example.org/friends/",
"links" : [],
"items" : [],
"queries" : [],
"template" : [],
"error" : []
}
}
24/147
http://example.org/api
JSON
{
"collection": {
"version": "1.0",
"href": "http://example.org/api",
"links": [{
"rel": "account",
"href": "http://example.org/account"
}, {
"rel": "order",
"href": "http://example.org/order"
}, {
"rel": "product",
"href": "http://example.org/product"
}]
}
}
25/147
http://example.org/account
JSON
{
"collection": {
"version": "1.0",
"href": "http://example.org/account",
"links": [{
"rel": "next",
"href": "http://example.org/account;page=2"
}],
"items": [],
"queries": [],
"template": []
}
}
26/147
http://example.org/account;page=2
JSON
{
"collection": {
"version": "1.0",
"href": "http://example.org/account;page=2",
"links": [{
"rel": "prev",
"href": "http://example.org/account"
},
{
"rel": "next",
"href": "http://example.org/account;page=3"
}],
"items": [],
"queries": [],
"template": []
}
}
27/147
http://example.org/account
JSON
{
"collection": {
"version": "1.0",
"href": "http://example.org/account",
...
"items": [
],
...
}
}
28/147
http://example.org/account
{
...
"items": [ {
"href": "/account/id/9468",
"data": [
{
"name": "username",
"value": "bob" },
{
"name": "id",
"value": "9468" }
],
"links": [
{
"name": "open",
"value": "/order/account/id/9468;status=open" },
{
"name": "recent",
"value": "/order/account/id/9468;status=recent" }
]
}]
...
JSON
}
29/147
http://example.org/account
JSON
{
"collection": {
"version": "1.0",
"href": "http://example.org/account",
...
"queries": [
],
...
}
}
30/147
http://example.org/account
{
...
"queries": [ {
"encoding": "uri-template",
"rel" : "search",
"href" : "/account{;status,page,ipp}"
"data": [
{
"name": "status",
"value": "" },
{
"name": "page",
"value": "" },
{
"name": "ipp",
"value" : "" }
]
}]
...
JSON
}
31/147
http://example.org/account;status=open;page=2
{
...
"queries": [ {
"encoding": "uri-template",
"rel" : "search",
"href" : "/account{;status,page,ipp}"
"data": [
{
"name": "status",
"value": "open" },
{
"name": "page",
"value": "2" },
{
"name": "ipp",
"value" : "" }
]
}]
...
JSON
}
32/147
RDF
Resource Description Framework (RDF)
· W3C Standard
· Graph-oriented
· URIs to identify subjects *AND* relationships
· SPARQL Protocol and Query Language
34/147
Everything You Know About Something
ID
Col1
Col2
Col3
Col4
Col5
Col6
....
ColN
Thing1
Value1
Value2
Value3
Value4
Value5
Value6
....
ValueN
35/147
Everything You Know About Everything
ID
Col1
Col2
Col3
Thing1
Value1
Value2
Value3
Thing2
Value1
Thing3
Value3
Value2
Value3
Col4
Col5
Col6
Value5
Value4
....
ColN
....
ValueN
Value5
Value6
....
ValueN
Value5
Value6
....
ValueN
Thing4
Value1
Value2
Value3
Value4
Value5
Value6
....
ValueN
...
...
...
...
...
...
...
....
...
36/147
Distribute Rows in their Entirety
ID
Col1
Col2
Col3
Thing1
Value1
Value2
Value3
Value5
Value2
Value3
Value5
Thing3
Col4
Col5
Col6
Value6
....
ColN
....
ValueN
....
ValueN
37/147
Distribute Columns in their Entirety
ID
Col2
Col3
Col5
ColN
Thing1
Value2
Value3
Thing3
Value2
Value3
Value5
ValueN
Thing4
Value2
Value3
Value5
ValueN
...
...
...
...
...
ValueN
38/147
Distribute Arbitrary Cells
ID
Col2
Thing1
Col3
Col5
Value3
Thing3
ValueN
Value5
Thing4
Value2
Value3
...
...
...
ColN
ValueN
ValueN
...
...
39/147
40/147
41/147
42/147
43/147
44/147
URIs to identify Rows/Cols
· Global
· Interoperable
· Decentralized
· Resolvable
45/147
SPARQL
RDFa
RDFa Lite
http://www.w3.org/TR/rdfa-lite/
48/147
(Basically) Unstructured Text
Published (theoretically) at http://example.com/src.html
<p>
My name is Manu Sporny and you can give me a ring via 1-800-555-0199.
</p>
HTML
49/147
Identifying Vocabulary
<p vocab="http://schema.org/">
My name is Manu Sporny and you can give me a ring via 1-800-555-0199.
</p>
HTML
50/147
Class Instance
<p vocab="http://schema.org/" typeof="Person">
My name is Manu Sporny and you can give me a ring via 1-800-555-0199.
</p>
HTML
51/147
Generated Triples
TURTLE
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix schema: <http://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://example.com/src.html>
rdfa:usesVocabulary schema: .
_:1
rdf:type schema:Person .
52/147
Adding Properties
<p vocab="http://schema.org/" typeof="Person">
My name is
<span property="name">Manu Sporny</span>
and you can give me a ring via
<span property="telephone">1-800-555-0199</span>
or visit
<a property="url" href="http://manu.sporny.org/">my homepage</a>.
</p>
HTML
53/147
Generated Triples
TURTLE
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix schema: <http://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://example.com/src.html>
rdfa:usesVocabulary schema: .
_:1
rdf:type schema:Person;
schema:name "Manu Sporny";
schema:telephone "1-800-555-0199";
schema:url <http://manu.sporny.org/> .
54/147
Adding Resource Identifier
<p vocab="http://schema.org/" resource="#manu" typeof="Person">
My name is
<span property="name">Manu Sporny</span>
and you can give me a ring via
<span property="telephone">1-800-555-0199</span>.
<img property="image" src="http://manu.sporny.org/images/manu.png" />
</p>
HTML
55/147
Generated Triples
TURTLE
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix schema: <http://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://example.com/src.html>
rdfa:usesVocabulary schema: .
<http://example.com/src.html#manu>
rdf:type schema:Person;
schema:name "Manu Sporny";
schema:telephone "1-800-555-0199";
schema:image <http://manu.sporny.org/images/manu.png> .
56/147
Multiple Vocabularies
HTML
<p vocab="http://schema.org/" prefix="ov: http://open.vocab.org/terms/"
resource="#manu" typeof="Person">
My name is
<span property="name">Manu Sporny</span>
and you can give me a ring via
<span property="telephone">1-800-555-0199</span>.
<img property="image" src="http://manu.sporny.org/images/manu.png" />
My favorite animal is the <span property="ov:preferredAnimal">Liger</span>.
</p>
57/147
Generated Triples
TURTLE
@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
@prefix schema: <http://schema.org/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://example.com/src.html>
rdfa:usesVocabulary schema: .
<http://example.com/src.html#manu>
rdf:type schema:Person;
schema:name "Manu Sporny";
schema:telephone "1-800-555-0199";
schema:image <http://manu.sporny.org/images/manu.png>;
<http://open.vocab.org/terms/preferredAnimal> "Liger" .
58/147
Demo
http://www.flickr.com/photos/61107193@N03/8964142468
R2RML
R2RML
http://www.w3.org/TR/r2rml/
61/147
Employee Table
EMPNO
ENAME
JOB
DEPTNO
7369
SMITH
CLERK
10
62/147
Department Table
DEPTNO
DNAME
LOC
10
APPSERVER
NEW YORK
63/147
Partial R2RML Mapping Document
@prefix rr: <http://www.w3.org/ns/r2rml#>.
@prefix ex: <http://example.com/ns#>.
TURTLE
<#TriplesMap1>
rr:logicalTable [ rr:tableName "EMP" ];
rr:subjectMap [
rr:template "http://data.example.com/employee/{EMPNO}";
rr:class ex:Employee;
];
rr:predicateObjectMap [
rr:predicate ex:name;
rr:objectMap [ rr:column "ENAME" ];
].
64/147
Generated Triples
TURTLE
<http://data.example.com/employee/7369> rdf:type ex:Employee.
<http://data.example.com/employee/7369> ex:name "SMITH".
65/147
Generating Views
<#DeptTableView> rr:sqlQuery """
SELECT DEPTNO,
DNAME,
LOC,
(SELECT COUNT(*) FROM EMP WHERE EMP.DEPTNO=DEPT.DEPTNO) AS STAFF
FROM DEPT;
""".
TURTLE
66/147
Working with Views
<#TriplesMap2>
rr:logicalTable <#DeptTableView>;
rr:subjectMap [
rr:template "http://data.example.com/department/{DEPTNO}";
rr:class ex:Department;
];
rr:predicateObjectMap [
rr:predicate ex:name;
rr:objectMap [ rr:column "DNAME" ];
];
rr:predicateObjectMap [
rr:predicate ex:location;
rr:objectMap [ rr:column "LOC" ];
];
rr:predicateObjectMap [
rr:predicate ex:staff;
rr:objectMap [ rr:column "STAFF" ];
].
TURTLE
67/147
Generated Triples
TURTLE
<http://data.example.com/department/10>
<http://data.example.com/department/10>
<http://data.example.com/department/10>
<http://data.example.com/department/10>
rdf:type ex:Department.
ex:name "APPSERVER".
ex:location "NEW YORK".
ex:staff 1.
68/147
Linking Two Tables
<#TriplesMap1>
rr:predicateObjectMap [
rr:predicate ex:department;
rr:objectMap [
rr:parentTriplesMap <#TriplesMap2>;
rr:joinCondition [
rr:child "DEPTNO";
rr:parent "DEPTNO";
];
];
].
TURTLE
69/147
Generated Triples
TURTLE
<http://data.example.com/employee/7369> ex:department
<http://data.example.com/department/10>.
70/147
Employee Table
EMPNO
ENAME
JOB
7369
SMITH
CLERK
7369
SMITH
NIGHTGUARD
7400
JONES
ENGINEER
71/147
Department Table
DEPTNO
DNAME
LOC
10
APPSERVER
NEW YORK
20
RESEARCH
BOSTON
72/147
Employee-Department Table
EMPNO
DEPTNO
7369
10
7369
20
7400
10
73/147
Many-to-Many Tables
TURTLE
<#TriplesMap3>
rr:logicalTable [ rr:tableName "EMP2DEPT" ];
rr:subjectMap [
rr:template "http://data.example.com/employee/{EMPNO}";
];
rr:predicateObjectMap [
rr:predicate ex:department;
rr:objectMap [ rr:template "http://data.example.com/department/{DEPTNO}" ];
].
74/147
Generated Triples
TURTLE
<http://data.example.com/employee/7369>
ex:department <http://data.example.com/department/10> ;
ex:department <http://data.example.com/department/20> .
<http://data.example.com/employee/7400>
ex:department <http://data.example.com/department/10>.
75/147
Translating Columns into IRIs
<#TriplesMap1>
rr:logicalTable [ rr:sqlQuery """
TURTLE
SELECT EMP.*, (CASE JOB
WHEN 'CLERK' THEN 'general-office'
WHEN 'NIGHTGUARD' THEN 'security'
WHEN 'ENGINEER' THEN 'engineering'
END) ROLE FROM EMP
""" ];
rr:subjectMap [
rr:template "http://data.example.com/employee/{EMPNO}";
];
rr:predicateObjectMap [
rr:predicate ex:role;
rr:objectMap [ rr:template "http://data.example.com/roles/{ROLE}" ];
].
76/147
Generated Triples
TURTLE
<http://data.example.com/employee/7369> ex:role
<http://data.example.com/roles/general-office>.
77/147
Demo
http://www.flickr.com/photos/61107193@N03/8964142468
Linked Data
Linked Data
·
·
·
·
A Rebranding Exercise for the Semantic Web
Focus is on the data
A consistent data model for the Web
Supports Discoverability
· Not necessarily public
80/147
Principles
http://www.w3.org/DesignIssues/LinkedData.html
· Use URIs to name things
· Use HTTP URIs to make them resolvable
· When someone resolves a URI, provide useful information via standards
(SPARQL, RDF, etc.)
· Include links for discoverability
81/147
Applicability
· Links are meaningful
· Intertwingle things with documents
· Consume data from sources you have never seen
· Useful for describing services too
82/147
Naming Issue
http://bosatsu.net/people/brian
URL
http://bosatsu.net/people/brian.html
URL
http://bosatsu.net/people/brian.rdf
URL
83/147
303 Redirect
curl -i https://w3id.org/people/bsletten
HTTP
HTTP/1.1 303 See Other
Date: Thu, 27 Feb 2014 15:44:58 GMT
Server: Apache/2.2.22 (Ubuntu)
Access-Control-Allow-Origin: *
Location: http://bosatsu.net/foaf/brian.rdf
Vary: Accept-Encoding
Content-Length: 315
Content-Type: text/html; charset=iso-8859-1
84/147
200 Response
curl -i http://bosatsu.net/foaf/brian.rdf
HTTP
HTTP/1.1 200 OK
Date: Thu, 27 Feb 2014 16:01:03 GMT
Server: Apache/2.2.16 (Debian)
Last-Modified: Thu, 09 May 2013 07:26:55 GMT
ETag: "402ab-2242-4dc43f9942dc0"
Accept-Ranges: bytes
Content-Length: 8770
Content-Type: application/rdf+xml
85/147
Fragment Identifiers
· Not everyone loves the 303 solution
· http://bosatsu.net/foaf#me
· Not directly resolvable
· Fragments are not sent to the server
86/147
200 Response
curl -i http://bosatsu.net/foaf#me
HTTP
HTTP/1.1 200 OK
Date: Thu, 27 Feb 2014 16:01:03 GMT
Server: Apache/2.2.16 (Debian)
Last-Modified: Thu, 09 May 2013 07:26:55 GMT
ETag: "402ab-2242-4dc43f9942dc0"
Accept-Ranges: bytes
Content-Length: 8770
Content-Type: application/rdf+xml
87/147
Resource Description Framework (RDF)
· W3C Standard
· Graph-oriented
· URIs to identify subjects *AND* relationships
· SPARQL Protocol and Query Language
88/147
@sandhawke
89/147
Linking Open Data Project
· Started in 2007 by W3C Semantic Web Education and Outreach(SWEO) Interest
Group
· Make data freely available
· Doubled in size every 10 months
90/147
91/147
92/147
93/147
94/147
95/147
96/147
97/147
98/147
99/147
100/147
101/147
Domain
# Datasets
# Triples
# Links
Media
25
1,800,000,000
50,000,00
Geographic
31
6,000,000,000
35,000,000
Government
49
13,000,000,000
19,000,000
Publications
87
2,900,000,000
140,000,000
Cross-Domain
41
4,100,000,000
63,000,000
Life Sciences
41
3,000,000,000
191,000,000
User-Generated Content
20
134,000,000
3,400,000
Total
295
31,000,000,000
504,000,000
http://lod-cloud.net/state/
102/147
103/147
104/147
105/147
DBPedia
·
·
·
·
Linked Dataset derived from Wikipedia
Creative Commons Attribution-ShareAlike 3.0 License
GNU Free Documentation License
Multi-domain
· Consensus-based
· Kept current by Wikipedia activity
· Multi-lingual
106/147
DBPedia Numbers (English Version)
http://dbpedia.org/About
·
·
·
·
·
·
·
·
Describes 4 million things
3.22 million are classified by an ontology
832,000 people
639,000 places
372,000 creative works
209,000 organizations
226,000 species
5,600 diseases
107/147
DBPedia Numbers (Non-English Version)
http://dbpedia.org/About
· 119 Localized Language Versions
· Describe 24.9 million things (w/ repetition)
· 16.8 million are connected to English DBPedia
108/147
DBPedia Summary
http://wiki.dbpedia.org/Datasets39/DatasetStatistics?v=dqp
· Overall 12.6 million unique things
· 24.6 million links to images
· 27.6 million links to pages
·
·
·
·
·
45 million links to other RDF datasets
67 million links to Wikipedia categories
41.2 million links to YAGO categories
2.46 billion RDF triples
470 million (English), 1.98 billion (Non-English)
109/147
Use Cases
http://wiki.dbpedia.org/UseCases?v=ene
·
·
·
·
·
Improve Wikipedia Search
Include DBPedia data in your documents
Support for Geographic Data
Documentation Classification, Annotation
Multi-Domain Ontology
110/147
DBPedia
http://dbpedia.org
111/147
Most Important Query Ever Run
http://tinyurl.com/n9hhs68
112/147
Linked MDB
http://data.linkedmdb.org
113/147
Freebase
http://freebase.com
114/147
Data.gov
http://data.gov
115/147
Linked Data Life
http://linkedlifedata.com
116/147
Dydra
http://dydra.com
http://dydra.com/sp2b/sp2b-10k/
117/147
curl -H 'Accept: application/sparql-results+json'
http://s4TFW7HEhOyDgTZobpqY@dydra.com/bosatsu/test/sparql
?query=select%20%2A%20where%20%7B%3Fs%20%3Fp%20%3Fo%7D%20limit%2010
COMMAND
SPARQL-RESULTS+JSON
{ "head": { "vars": [ "s", "p", "o" ] },
"results": {
"bindings": [
{ "s": {"type":"bnode", "value":"genid1"},
"p": {"type":"uri", "value":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type"},
"o": {"type":"uri", "value":"http://xmlns.com/foaf/0.1/Person"} },
{ "s": {"type":"bnode", "value":"genid1"},
"p": {"type":"uri", "value":"http://xmlns.com/foaf/0.1/name"},
"o": {"type":"literal", "value":"Ora Lassila"} },
{ "s": {"type":"bnode", "value":"genid1"},
"p": {"type":"uri", "value":"http://www.w3.org/2000/01/rdf-schema#seeAlso"},
"o": {"type":"uri", "value":"http://lassila.org/ora.rdf#me"} },
{ "s": {"type":"bnode", "value":"genid2"},
"p": {"type":"uri", "value":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type"},
"o": {"type":"uri", "value":"http://xmlns.com/foaf/0.1/Person"} },
...
{ "s": {"type":"bnode", "value":"genid4"},
"p": {"type":"uri", "value":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type"},
"o": {"type":"uri", "value":"http://xmlns.com/foaf/0.1/Person"} }
]
}
}
118/147
datahub.io
http://datahub.io
119/147
ProductDB
http://productdb.org
120/147
FlickrWrapper
http://wifo5-03.informatik.uni-mannheim.de/flickrwrappr/
121/147
Artists for St. Patrick's Day
· Find music recommendations related to St. Patrick's Day
· Use DBPedia to find musical artists who are from Ireland
122/147
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SPARQL
SELECT DISTINCT ?name ?person ?artist WHERE {
?person foaf:name ?name .
?person rdf:type <http://dbpedia.org/ontology/MusicalArtist> .
?person <http://dbpedia.org/ontology/associatedMusicalArtist> ?artist .
{
?person dbo:hometown <http://dbpedia.org/resource/Republic_of_Ireland> .
}
UNION
{
?person dbo:birthPlace <http://dbpedia.org/resource/Republic_of_Ireland> .
}
}
ORDER BY ?name
http://tinyurl.com/jwtt2aj
URL
123/147
124/147
SPARQL + R
http://linkedscience.org/tools/sparql-package-for-r/
125/147
Set Up SPARQL Query
library(SPARQL) # SPARQL querying package
library(ggplot2)
R
# Step 1 - Set up preliminaries and define query
# Define the data.gov endpoint
endpoint <- "http://services.data.gov/sparql"
# create query statement
query <"PREFIX dgp1187: <http://data-gov.tw.rpi.edu/vocab/p/1187/>
SELECT ?ye ?fi ?ac
WHERE {
?s dgp1187:year ?ye .
?s dgp1187:fires ?fi .
?s dgp1187:acres ?ac .
}"
126/147
Process SPARQL Query
# Step 2 - Use SPARQL package to submit query and save results to a data frame
qd <- SPARQL(endpoint,query)
df <- qd$results
R
# Step 3 - Prep for graphing
# Numbers are usually returned as characters, so convert to numeric and create a
# variable for "average acres burned per fire"
str(df)
df <- as.data.frame(apply(df, 2, as.numeric))
str(df)
df$avgperfire <- df$ac/df$fi
127/147
Plot Results
# Step 4 - Plot some data
ggplot(df, aes(x=ye, y=avgperfire, group=1))
+geom_point() +stat_smooth()
+scale_x_continuous(breaks=seq(1960, 2008, 5))
+xlab("Year")
+ylab("Average acres burned per fire")
R
ggplot(df, aes(x=ye, y=fi, group=1))
+geom_point() +stat_smooth()
+scale_x_continuous(breaks=seq(1960, 2008, 5))
+xlab("Year")
+ylab("Number of fires")
ggplot(df, aes(x=ye, y=ac, group=1))
+geom_point() +stat_smooth()
+scale_x_continuous(breaks=seq(1960, 2008, 5))
+xlab("Year")
+ylab("Acres burned")
128/147
FOAF Explorer
http://xml.mfd-consult.dk/foaf/explorer/
129/147
RelFinder
http://www.visualdataweb.org/relfinder.php
130/147
JSON-LD
http://json-ld.org
131/147
JSON-LD Actions in Inbox
https://developers.google.com/gmail/actions/reference/formats/json-ld
132/147
Adding Tour Dates to the Knowledge Graph
http://googlewebmastercentral.blogspot.co.uk/2014/03/musical-artistsyour-official-tour.html
133/147
Books
135/147
136/147
137/147
138/147
139/147
140/147
141/147
142/147
143/147
144/147
145/147
146/147
Questions?
" brian@bosatsu.net
! @bsletten
+ http://tinyurl.com/bjs-gplus
$ bsletten
Download