The Semantic Web Brian Sletten ! @bsletten 04/02/2014 Speaker Qualifications · Specialize in next-generation technologies · Author of "Resource-Oriented Architecture Patterns for Webs of Data" · One of Top 100 Semantic Web People 3/147 Agenda · Introduction · REST · RDF · SPARQL · RDFa · R2RML · Linked Data 4/147 Introduction Where We Do Data Integration · Databases · Code · Services · Data Warehouses 6/147 “ If we had individual words to represent every particularity we would have to have an infinite number of them, which would exceed our capability of learning, recalling and manipulating them. ” Daniel Chandler “ [i]f words had the job of representing concepts fixed in advance, one would be able to find exact equivalents for them as between one language and another. But this is not the case. ” Ferdinand de Saussure “ backpfeifengesicht ” 10/147 “ There are no 'natural' concepts or categories which are simply 'reflected' in language. Language plays a crucial role in 'constructing reality'. ” Daniel Chandler 12/147 http://www.flickr.com/photos/75467759@N00/4880012589 http://www.flickr.com/photos/iamart3/478545143 http://www.flickr.com/photos/68901280@N00/5250299238 REST 17/147 18/147 19/147 20/147 http://amundsen.com/media-types/collection/ 21/147 application/vnd.collections+json http://amundsen.com/blog/ 22/147 Collections+JSON JSON { "collection" : { "version" : "1.0", "href" : "http://example.org/friends/" } } 23/147 Collections+JSON JSON { "collection" : { "version" : "1.0", "href" : "http://example.org/friends/", "links" : [], "items" : [], "queries" : [], "template" : [], "error" : [] } } 24/147 http://example.org/api JSON { "collection": { "version": "1.0", "href": "http://example.org/api", "links": [{ "rel": "account", "href": "http://example.org/account" }, { "rel": "order", "href": "http://example.org/order" }, { "rel": "product", "href": "http://example.org/product" }] } } 25/147 http://example.org/account JSON { "collection": { "version": "1.0", "href": "http://example.org/account", "links": [{ "rel": "next", "href": "http://example.org/account;page=2" }], "items": [], "queries": [], "template": [] } } 26/147 http://example.org/account;page=2 JSON { "collection": { "version": "1.0", "href": "http://example.org/account;page=2", "links": [{ "rel": "prev", "href": "http://example.org/account" }, { "rel": "next", "href": "http://example.org/account;page=3" }], "items": [], "queries": [], "template": [] } } 27/147 http://example.org/account JSON { "collection": { "version": "1.0", "href": "http://example.org/account", ... "items": [ ], ... } } 28/147 http://example.org/account { ... "items": [ { "href": "/account/id/9468", "data": [ { "name": "username", "value": "bob" }, { "name": "id", "value": "9468" } ], "links": [ { "name": "open", "value": "/order/account/id/9468;status=open" }, { "name": "recent", "value": "/order/account/id/9468;status=recent" } ] }] ... JSON } 29/147 http://example.org/account JSON { "collection": { "version": "1.0", "href": "http://example.org/account", ... "queries": [ ], ... } } 30/147 http://example.org/account { ... "queries": [ { "encoding": "uri-template", "rel" : "search", "href" : "/account{;status,page,ipp}" "data": [ { "name": "status", "value": "" }, { "name": "page", "value": "" }, { "name": "ipp", "value" : "" } ] }] ... JSON } 31/147 http://example.org/account;status=open;page=2 { ... "queries": [ { "encoding": "uri-template", "rel" : "search", "href" : "/account{;status,page,ipp}" "data": [ { "name": "status", "value": "open" }, { "name": "page", "value": "2" }, { "name": "ipp", "value" : "" } ] }] ... JSON } 32/147 RDF Resource Description Framework (RDF) · W3C Standard · Graph-oriented · URIs to identify subjects *AND* relationships · SPARQL Protocol and Query Language 34/147 Everything You Know About Something ID Col1 Col2 Col3 Col4 Col5 Col6 .... ColN Thing1 Value1 Value2 Value3 Value4 Value5 Value6 .... ValueN 35/147 Everything You Know About Everything ID Col1 Col2 Col3 Thing1 Value1 Value2 Value3 Thing2 Value1 Thing3 Value3 Value2 Value3 Col4 Col5 Col6 Value5 Value4 .... ColN .... ValueN Value5 Value6 .... ValueN Value5 Value6 .... ValueN Thing4 Value1 Value2 Value3 Value4 Value5 Value6 .... ValueN ... ... ... ... ... ... ... .... ... 36/147 Distribute Rows in their Entirety ID Col1 Col2 Col3 Thing1 Value1 Value2 Value3 Value5 Value2 Value3 Value5 Thing3 Col4 Col5 Col6 Value6 .... ColN .... ValueN .... ValueN 37/147 Distribute Columns in their Entirety ID Col2 Col3 Col5 ColN Thing1 Value2 Value3 Thing3 Value2 Value3 Value5 ValueN Thing4 Value2 Value3 Value5 ValueN ... ... ... ... ... ValueN 38/147 Distribute Arbitrary Cells ID Col2 Thing1 Col3 Col5 Value3 Thing3 ValueN Value5 Thing4 Value2 Value3 ... ... ... ColN ValueN ValueN ... ... 39/147 40/147 41/147 42/147 43/147 44/147 URIs to identify Rows/Cols · Global · Interoperable · Decentralized · Resolvable 45/147 SPARQL RDFa RDFa Lite http://www.w3.org/TR/rdfa-lite/ 48/147 (Basically) Unstructured Text Published (theoretically) at http://example.com/src.html <p> My name is Manu Sporny and you can give me a ring via 1-800-555-0199. </p> HTML 49/147 Identifying Vocabulary <p vocab="http://schema.org/"> My name is Manu Sporny and you can give me a ring via 1-800-555-0199. </p> HTML 50/147 Class Instance <p vocab="http://schema.org/" typeof="Person"> My name is Manu Sporny and you can give me a ring via 1-800-555-0199. </p> HTML 51/147 Generated Triples TURTLE @prefix rdfa: <http://www.w3.org/ns/rdfa#> . @prefix schema: <http://schema.org/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://example.com/src.html> rdfa:usesVocabulary schema: . _:1 rdf:type schema:Person . 52/147 Adding Properties <p vocab="http://schema.org/" typeof="Person"> My name is <span property="name">Manu Sporny</span> and you can give me a ring via <span property="telephone">1-800-555-0199</span> or visit <a property="url" href="http://manu.sporny.org/">my homepage</a>. </p> HTML 53/147 Generated Triples TURTLE @prefix rdfa: <http://www.w3.org/ns/rdfa#> . @prefix schema: <http://schema.org/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://example.com/src.html> rdfa:usesVocabulary schema: . _:1 rdf:type schema:Person; schema:name "Manu Sporny"; schema:telephone "1-800-555-0199"; schema:url <http://manu.sporny.org/> . 54/147 Adding Resource Identifier <p vocab="http://schema.org/" resource="#manu" typeof="Person"> My name is <span property="name">Manu Sporny</span> and you can give me a ring via <span property="telephone">1-800-555-0199</span>. <img property="image" src="http://manu.sporny.org/images/manu.png" /> </p> HTML 55/147 Generated Triples TURTLE @prefix rdfa: <http://www.w3.org/ns/rdfa#> . @prefix schema: <http://schema.org/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://example.com/src.html> rdfa:usesVocabulary schema: . <http://example.com/src.html#manu> rdf:type schema:Person; schema:name "Manu Sporny"; schema:telephone "1-800-555-0199"; schema:image <http://manu.sporny.org/images/manu.png> . 56/147 Multiple Vocabularies HTML <p vocab="http://schema.org/" prefix="ov: http://open.vocab.org/terms/" resource="#manu" typeof="Person"> My name is <span property="name">Manu Sporny</span> and you can give me a ring via <span property="telephone">1-800-555-0199</span>. <img property="image" src="http://manu.sporny.org/images/manu.png" /> My favorite animal is the <span property="ov:preferredAnimal">Liger</span>. </p> 57/147 Generated Triples TURTLE @prefix rdfa: <http://www.w3.org/ns/rdfa#> . @prefix schema: <http://schema.org/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://example.com/src.html> rdfa:usesVocabulary schema: . <http://example.com/src.html#manu> rdf:type schema:Person; schema:name "Manu Sporny"; schema:telephone "1-800-555-0199"; schema:image <http://manu.sporny.org/images/manu.png>; <http://open.vocab.org/terms/preferredAnimal> "Liger" . 58/147 Demo http://www.flickr.com/photos/61107193@N03/8964142468 R2RML R2RML http://www.w3.org/TR/r2rml/ 61/147 Employee Table EMPNO ENAME JOB DEPTNO 7369 SMITH CLERK 10 62/147 Department Table DEPTNO DNAME LOC 10 APPSERVER NEW YORK 63/147 Partial R2RML Mapping Document @prefix rr: <http://www.w3.org/ns/r2rml#>. @prefix ex: <http://example.com/ns#>. TURTLE <#TriplesMap1> rr:logicalTable [ rr:tableName "EMP" ]; rr:subjectMap [ rr:template "http://data.example.com/employee/{EMPNO}"; rr:class ex:Employee; ]; rr:predicateObjectMap [ rr:predicate ex:name; rr:objectMap [ rr:column "ENAME" ]; ]. 64/147 Generated Triples TURTLE <http://data.example.com/employee/7369> rdf:type ex:Employee. <http://data.example.com/employee/7369> ex:name "SMITH". 65/147 Generating Views <#DeptTableView> rr:sqlQuery """ SELECT DEPTNO, DNAME, LOC, (SELECT COUNT(*) FROM EMP WHERE EMP.DEPTNO=DEPT.DEPTNO) AS STAFF FROM DEPT; """. TURTLE 66/147 Working with Views <#TriplesMap2> rr:logicalTable <#DeptTableView>; rr:subjectMap [ rr:template "http://data.example.com/department/{DEPTNO}"; rr:class ex:Department; ]; rr:predicateObjectMap [ rr:predicate ex:name; rr:objectMap [ rr:column "DNAME" ]; ]; rr:predicateObjectMap [ rr:predicate ex:location; rr:objectMap [ rr:column "LOC" ]; ]; rr:predicateObjectMap [ rr:predicate ex:staff; rr:objectMap [ rr:column "STAFF" ]; ]. TURTLE 67/147 Generated Triples TURTLE <http://data.example.com/department/10> <http://data.example.com/department/10> <http://data.example.com/department/10> <http://data.example.com/department/10> rdf:type ex:Department. ex:name "APPSERVER". ex:location "NEW YORK". ex:staff 1. 68/147 Linking Two Tables <#TriplesMap1> rr:predicateObjectMap [ rr:predicate ex:department; rr:objectMap [ rr:parentTriplesMap <#TriplesMap2>; rr:joinCondition [ rr:child "DEPTNO"; rr:parent "DEPTNO"; ]; ]; ]. TURTLE 69/147 Generated Triples TURTLE <http://data.example.com/employee/7369> ex:department <http://data.example.com/department/10>. 70/147 Employee Table EMPNO ENAME JOB 7369 SMITH CLERK 7369 SMITH NIGHTGUARD 7400 JONES ENGINEER 71/147 Department Table DEPTNO DNAME LOC 10 APPSERVER NEW YORK 20 RESEARCH BOSTON 72/147 Employee-Department Table EMPNO DEPTNO 7369 10 7369 20 7400 10 73/147 Many-to-Many Tables TURTLE <#TriplesMap3> rr:logicalTable [ rr:tableName "EMP2DEPT" ]; rr:subjectMap [ rr:template "http://data.example.com/employee/{EMPNO}"; ]; rr:predicateObjectMap [ rr:predicate ex:department; rr:objectMap [ rr:template "http://data.example.com/department/{DEPTNO}" ]; ]. 74/147 Generated Triples TURTLE <http://data.example.com/employee/7369> ex:department <http://data.example.com/department/10> ; ex:department <http://data.example.com/department/20> . <http://data.example.com/employee/7400> ex:department <http://data.example.com/department/10>. 75/147 Translating Columns into IRIs <#TriplesMap1> rr:logicalTable [ rr:sqlQuery """ TURTLE SELECT EMP.*, (CASE JOB WHEN 'CLERK' THEN 'general-office' WHEN 'NIGHTGUARD' THEN 'security' WHEN 'ENGINEER' THEN 'engineering' END) ROLE FROM EMP """ ]; rr:subjectMap [ rr:template "http://data.example.com/employee/{EMPNO}"; ]; rr:predicateObjectMap [ rr:predicate ex:role; rr:objectMap [ rr:template "http://data.example.com/roles/{ROLE}" ]; ]. 76/147 Generated Triples TURTLE <http://data.example.com/employee/7369> ex:role <http://data.example.com/roles/general-office>. 77/147 Demo http://www.flickr.com/photos/61107193@N03/8964142468 Linked Data Linked Data · · · · A Rebranding Exercise for the Semantic Web Focus is on the data A consistent data model for the Web Supports Discoverability · Not necessarily public 80/147 Principles http://www.w3.org/DesignIssues/LinkedData.html · Use URIs to name things · Use HTTP URIs to make them resolvable · When someone resolves a URI, provide useful information via standards (SPARQL, RDF, etc.) · Include links for discoverability 81/147 Applicability · Links are meaningful · Intertwingle things with documents · Consume data from sources you have never seen · Useful for describing services too 82/147 Naming Issue http://bosatsu.net/people/brian URL http://bosatsu.net/people/brian.html URL http://bosatsu.net/people/brian.rdf URL 83/147 303 Redirect curl -i https://w3id.org/people/bsletten HTTP HTTP/1.1 303 See Other Date: Thu, 27 Feb 2014 15:44:58 GMT Server: Apache/2.2.22 (Ubuntu) Access-Control-Allow-Origin: * Location: http://bosatsu.net/foaf/brian.rdf Vary: Accept-Encoding Content-Length: 315 Content-Type: text/html; charset=iso-8859-1 84/147 200 Response curl -i http://bosatsu.net/foaf/brian.rdf HTTP HTTP/1.1 200 OK Date: Thu, 27 Feb 2014 16:01:03 GMT Server: Apache/2.2.16 (Debian) Last-Modified: Thu, 09 May 2013 07:26:55 GMT ETag: "402ab-2242-4dc43f9942dc0" Accept-Ranges: bytes Content-Length: 8770 Content-Type: application/rdf+xml 85/147 Fragment Identifiers · Not everyone loves the 303 solution · http://bosatsu.net/foaf#me · Not directly resolvable · Fragments are not sent to the server 86/147 200 Response curl -i http://bosatsu.net/foaf#me HTTP HTTP/1.1 200 OK Date: Thu, 27 Feb 2014 16:01:03 GMT Server: Apache/2.2.16 (Debian) Last-Modified: Thu, 09 May 2013 07:26:55 GMT ETag: "402ab-2242-4dc43f9942dc0" Accept-Ranges: bytes Content-Length: 8770 Content-Type: application/rdf+xml 87/147 Resource Description Framework (RDF) · W3C Standard · Graph-oriented · URIs to identify subjects *AND* relationships · SPARQL Protocol and Query Language 88/147 @sandhawke 89/147 Linking Open Data Project · Started in 2007 by W3C Semantic Web Education and Outreach(SWEO) Interest Group · Make data freely available · Doubled in size every 10 months 90/147 91/147 92/147 93/147 94/147 95/147 96/147 97/147 98/147 99/147 100/147 101/147 Domain # Datasets # Triples # Links Media 25 1,800,000,000 50,000,00 Geographic 31 6,000,000,000 35,000,000 Government 49 13,000,000,000 19,000,000 Publications 87 2,900,000,000 140,000,000 Cross-Domain 41 4,100,000,000 63,000,000 Life Sciences 41 3,000,000,000 191,000,000 User-Generated Content 20 134,000,000 3,400,000 Total 295 31,000,000,000 504,000,000 http://lod-cloud.net/state/ 102/147 103/147 104/147 105/147 DBPedia · · · · Linked Dataset derived from Wikipedia Creative Commons Attribution-ShareAlike 3.0 License GNU Free Documentation License Multi-domain · Consensus-based · Kept current by Wikipedia activity · Multi-lingual 106/147 DBPedia Numbers (English Version) http://dbpedia.org/About · · · · · · · · Describes 4 million things 3.22 million are classified by an ontology 832,000 people 639,000 places 372,000 creative works 209,000 organizations 226,000 species 5,600 diseases 107/147 DBPedia Numbers (Non-English Version) http://dbpedia.org/About · 119 Localized Language Versions · Describe 24.9 million things (w/ repetition) · 16.8 million are connected to English DBPedia 108/147 DBPedia Summary http://wiki.dbpedia.org/Datasets39/DatasetStatistics?v=dqp · Overall 12.6 million unique things · 24.6 million links to images · 27.6 million links to pages · · · · · 45 million links to other RDF datasets 67 million links to Wikipedia categories 41.2 million links to YAGO categories 2.46 billion RDF triples 470 million (English), 1.98 billion (Non-English) 109/147 Use Cases http://wiki.dbpedia.org/UseCases?v=ene · · · · · Improve Wikipedia Search Include DBPedia data in your documents Support for Geographic Data Documentation Classification, Annotation Multi-Domain Ontology 110/147 DBPedia http://dbpedia.org 111/147 Most Important Query Ever Run http://tinyurl.com/n9hhs68 112/147 Linked MDB http://data.linkedmdb.org 113/147 Freebase http://freebase.com 114/147 Data.gov http://data.gov 115/147 Linked Data Life http://linkedlifedata.com 116/147 Dydra http://dydra.com http://dydra.com/sp2b/sp2b-10k/ 117/147 curl -H 'Accept: application/sparql-results+json' http://s4TFW7HEhOyDgTZobpqY@dydra.com/bosatsu/test/sparql ?query=select%20%2A%20where%20%7B%3Fs%20%3Fp%20%3Fo%7D%20limit%2010 COMMAND SPARQL-RESULTS+JSON { "head": { "vars": [ "s", "p", "o" ] }, "results": { "bindings": [ { "s": {"type":"bnode", "value":"genid1"}, "p": {"type":"uri", "value":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}, "o": {"type":"uri", "value":"http://xmlns.com/foaf/0.1/Person"} }, { "s": {"type":"bnode", "value":"genid1"}, "p": {"type":"uri", "value":"http://xmlns.com/foaf/0.1/name"}, "o": {"type":"literal", "value":"Ora Lassila"} }, { "s": {"type":"bnode", "value":"genid1"}, "p": {"type":"uri", "value":"http://www.w3.org/2000/01/rdf-schema#seeAlso"}, "o": {"type":"uri", "value":"http://lassila.org/ora.rdf#me"} }, { "s": {"type":"bnode", "value":"genid2"}, "p": {"type":"uri", "value":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}, "o": {"type":"uri", "value":"http://xmlns.com/foaf/0.1/Person"} }, ... { "s": {"type":"bnode", "value":"genid4"}, "p": {"type":"uri", "value":"http://www.w3.org/1999/02/22-rdf-syntax-ns#type"}, "o": {"type":"uri", "value":"http://xmlns.com/foaf/0.1/Person"} } ] } } 118/147 datahub.io http://datahub.io 119/147 ProductDB http://productdb.org 120/147 FlickrWrapper http://wifo5-03.informatik.uni-mannheim.de/flickrwrappr/ 121/147 Artists for St. Patrick's Day · Find music recommendations related to St. Patrick's Day · Use DBPedia to find musical artists who are from Ireland 122/147 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX dbo: <http://dbpedia.org/ontology/> SPARQL SELECT DISTINCT ?name ?person ?artist WHERE { ?person foaf:name ?name . ?person rdf:type <http://dbpedia.org/ontology/MusicalArtist> . ?person <http://dbpedia.org/ontology/associatedMusicalArtist> ?artist . { ?person dbo:hometown <http://dbpedia.org/resource/Republic_of_Ireland> . } UNION { ?person dbo:birthPlace <http://dbpedia.org/resource/Republic_of_Ireland> . } } ORDER BY ?name http://tinyurl.com/jwtt2aj URL 123/147 124/147 SPARQL + R http://linkedscience.org/tools/sparql-package-for-r/ 125/147 Set Up SPARQL Query library(SPARQL) # SPARQL querying package library(ggplot2) R # Step 1 - Set up preliminaries and define query # Define the data.gov endpoint endpoint <- "http://services.data.gov/sparql" # create query statement query <"PREFIX dgp1187: <http://data-gov.tw.rpi.edu/vocab/p/1187/> SELECT ?ye ?fi ?ac WHERE { ?s dgp1187:year ?ye . ?s dgp1187:fires ?fi . ?s dgp1187:acres ?ac . }" 126/147 Process SPARQL Query # Step 2 - Use SPARQL package to submit query and save results to a data frame qd <- SPARQL(endpoint,query) df <- qd$results R # Step 3 - Prep for graphing # Numbers are usually returned as characters, so convert to numeric and create a # variable for "average acres burned per fire" str(df) df <- as.data.frame(apply(df, 2, as.numeric)) str(df) df$avgperfire <- df$ac/df$fi 127/147 Plot Results # Step 4 - Plot some data ggplot(df, aes(x=ye, y=avgperfire, group=1)) +geom_point() +stat_smooth() +scale_x_continuous(breaks=seq(1960, 2008, 5)) +xlab("Year") +ylab("Average acres burned per fire") R ggplot(df, aes(x=ye, y=fi, group=1)) +geom_point() +stat_smooth() +scale_x_continuous(breaks=seq(1960, 2008, 5)) +xlab("Year") +ylab("Number of fires") ggplot(df, aes(x=ye, y=ac, group=1)) +geom_point() +stat_smooth() +scale_x_continuous(breaks=seq(1960, 2008, 5)) +xlab("Year") +ylab("Acres burned") 128/147 FOAF Explorer http://xml.mfd-consult.dk/foaf/explorer/ 129/147 RelFinder http://www.visualdataweb.org/relfinder.php 130/147 JSON-LD http://json-ld.org 131/147 JSON-LD Actions in Inbox https://developers.google.com/gmail/actions/reference/formats/json-ld 132/147 Adding Tour Dates to the Knowledge Graph http://googlewebmastercentral.blogspot.co.uk/2014/03/musical-artistsyour-official-tour.html 133/147 Books 135/147 136/147 137/147 138/147 139/147 140/147 141/147 142/147 143/147 144/147 145/147 146/147 Questions? " brian@bosatsu.net ! @bsletten + http://tinyurl.com/bjs-gplus $ bsletten