Hacking Linked Data


Hacking Linked Data

for librarians!

A hands-on-exploration of an often nebulous concept Reinhard Engels, ABCD Library, October 2013


1. I am not an expert!

2. Apparently it takes more than a week to become one 3. Your brain may hurt


1. Convey some actual knowledge about LD 2. Let you pass a polygraph 3. Reassure that it’s OK to be confused 4. Lower the bar for asking “stupid questions”

Agenda for next ~80 minutes

1. Quick review of what Linked Data (LD) is 2. Look at some real LD (Dbpedia, NY Times) 3. Make some simple LD (RDF “N-Triples”) 4. Query remote LD source (SPARQL on Dbpedia) 5. How to embed LD in HTML (RDFa et al.) 6. Ponder things that are kinda sorta like LD 7. Recover!

Linked Data: What?

• • • • • • “a set of best practices for publishing and connecting structured data on the web” Conceived by the guy who invented the WWW Web of Data Turns the web into a giant database With a single, consistent API Simple, elegant, familiar mechanism: URIs and “typed links”

Linked Data: Why?

• • • For users: enables meaningful queries instead of just text string searches; research applications, consumer applications For creators: efficiency of not having to redundantly create and maintain data.

One API for all data: this is a thing of beauty in itself.

Linked Data: How? (Mug version)

Linked Data: How? (Principles)

1. Use URIs as names for things.

2. Use HTTP URIs, so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL).

4. Include links to other URIs, so that they can discover more things.

Linked Data: How? (RE’s formulation) 1. Describe things using RDF triples 2. Identify things using HTTP URIs 3. Those URIs should link to more LD (that other people have already created, whenever possible)

Linked Data: How? (RE’s even shorter reformulation) 1. Describe with RDF 2. Identify with HTTP URIs 3. Link to more LD

1. Describe with RDF

• • • • • RDF = Resource Description Framework F stands for framework, not file type!

It’s a conceptual model “content agnostic” (can describe anything) Describe things using 3 terms (“RDF triples”)

1. Subject

Fred Fred

2. Predicate

Likes Date of Birth

3. Object

Wilma October 2, 1973

2. Identify with HTTP URIs

1. Subject and Predicate MUST be URIs 2. Object may be URI or raw value (number, text, date, etc.)

1. Subject

Fred http://s.org/fred http://s.org/fred

2. Predicate

Likes http://p.org/likes http://p.org/likes

3. Object

Wilma “Wilma” http://o.org/wilma

3. Link to more LD

• • • What format should the referenced LD be in?

If I go to http://o.org/wilma , what should I see there?

Are predicates in RDF too? http://p.org/likes


(Why are you even here?)

And it will save the world!

(Why aren’t you making Linked Data NOW?)

Montage of leading LD sites

"At first glance, the principles of Linked Data seem simple enough. However experienced Web developers, designers and architects who attempt to put these ideas into practice often find themselves having to digest and understand debates about Web architecture, the semantic web, artificial intelligence and the philosophical nature of identity.” – Ed Summers & Dorothea Salo

“LD makes my brain hurt”

• • • • • • It’s OK!

Though core concepts are very simple It quickly gets confusing – it’s not just you Accidental: partially overlapping concepts.

Intrinsic: simple parts make complex whole Danger: Is it too simple? (ambiguous) “Make things as simple as possible, but not simpler.” – Einstein (paraphrased)

“External” Overlapping Concepts

• • • • • • There are a lot of things that are kinda sorta like LD!

Semantic Web (1994) Web APIs (10,214 and counting) Facebook Open Graph?

Schema.org and microdata? (google, yahoo, microsoft) microformats

The Semantic Web

• • • Semantic Web: 1994 “The vision of the Semantic Web is to extend principles of the Web from documents to data” – W3C “This simple idea [the Semantic Web]… remains largely unrealized.” – Tim Berners-Lee et al., 2006

LD and the Semantic Web

• • • • Is Linked Data (2006) a: Special case: narrowing and focusing?

Redo: “The semantic web done right?” Addition: Semantic web + links?

Rebranding of a troubled project?

“Internal” forms of confusion

• • • • URI, URL, URN, IRI, CURIE RDF “Serializations”: RDF/XML, RDFa, N Triples, Turtle, JSON-LD Ontologies vs. ontology languages vs. “schema languages” vs. plain old RDF: RDFs, OWL, FOAF SPARQL

Let’s make some LD!

• • • 5 star LD!

That means we need to link to other LD So we need to identify some existing LD to link to…


• • • • • Linked Data version of Wikipedia Take any wikipedia url Replace “en.wikipedia.org/wiki” With “dbpedia.org/page” And you have the LD expression of that concept.

Example: our fair city

• • http://en.wikipedia.org/wiki/Cambridge,_Massachusetts http://dbpedia.org/page/Cambridge,_Massachusetts

An RDF triple (in HTML)

RDF Graphs

• • • A set of RDF triples is called a “graph” Graph in this sense is a math/comp sci data structure Not a visual plot

“provide useful information using the standards…”

Cambridge in CSV (in excel)

2409 RDF Triples about Cambridge

Let’s make an LD “comment” about Cambridge!

1. Open the “ntriples” dbpedia file and find the existing English language comment

Using own id with sameAs




"Cambridge is a pretty cool town"@en

Stick this data:

5 Star LD!

At this URL: http://mylinkeddata.org/resource/123

Summary of LD creation

• • • Creating RDF triples is easy Figuring out the right HTTP URIs to use is hard Figuring out how to respond to any HTTP URI requests you receive is also harder than I would like

Querying LD with SPARQL

• • • • • • SPARQL: Recursive acronym for SPARQL Protocol and RDF Query Language RQL is the part we’re interested in LD’s answer to SQL Instead of querying tables in a db You query a graph of rdf triples Using “triple patterns” (and some other stuff)

Let’s query the DBPedia SPARQL endpoint!

Note: You want to point your browser to “snorql” (not sparql!): http://dbpedia.org/snorql

Not the most user friendly site…

A Query (in English)

• Show me name and dates of birth and death for people whose “main interests” are theology and nihilism

Same Query (in SPARQL)

PREFIX foaf: http://xmlns.com/foaf/0.1/ PREFIX dbo: http://dbpedia.org/ontology/ PREFIX : http://dbpedia.org/resource/ SELECT ?name ?birth ?death ?person WHERE { ?person dbo:mainInterest :Nihilism .

?person dbo:mainInterest :Theology .

?person dbo:birthDate ?birth .

?person foaf:name ?name .

?person dbo:deathDate ?death .

} ORDER BY ?name


More sample queries

http://wiki.dbpedia.org/OnlineAccess#h28-5 Play around with them. Swap out some parameters. Stare at your favorite dbpedia records you found bymodifying wikipedia urls to get ideas for other triple patterns.

If you want to run SPARL against your own RDF data… • • • • Install apache Jena (java framework) Use the command line ARQ tool Warning: probably too geeky for most folks in this room.

But if you’re serious about going deeper, probably unavoidable

SPARQL Summary

• • • • SPARQL syntax harder than RDF But again, the hardest part seems to be figuring out what URIs to plug in Existing tools not very user friendly Promise of querying the entire Web of Data still a way off


• • • • • • Regular LD sort of a parallel web of data RDFa and related technologies embed web of data within the web of documents The “a” stands for attributes” Metatags on steroids But good, W3C doctor approved steroids!

Sounds like an afterthought, but probably far more widely used than any other form of LD.

$$$ Rich Snippets $$$

What does RDFa look like under the hood?


Facebook’s Open Graph API

• • • • • • • Graph? Sounds like LD!

And indeed, uses RDFa But not “pure RDFa” And only for ingest http://graph.facebook.com/reinhard.engels

http://graph.facebook.com/harvard http://graph.facebook.com/zuck

Summary of my LD experience

• • • Frustrated by ambiguities and the many competing ways of doing more or less the same thing Frustrated by disconnect between grand vision of one API for the Web of Data and the sorry little SPARQL queries I was able to run Not overjoyed that SEO spamming seems the one area in which LD is really succeeding


• “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.” – Amara’s Law