Hacking Linked Data

advertisement

Hacking Linked Data

for librarians!

A hands-on-exploration of an often nebulous concept

Reinhard Engels, ABCD Library, October 2013

Disclaimers

1. I am not an expert!

2. Apparently it takes more than a week to become one

3. Your brain may hurt

Goals

1. Convey some actual knowledge about LD

2. Let you pass a polygraph

3. Reassure that it’s OK to be confused

4. Lower the bar for asking “stupid questions”

Agenda for next ~80 minutes

1. Quick review of what Linked Data (LD) is

2. Look at some real LD (Dbpedia, NY Times)

3. Make some simple LD (RDF “N-Triples”)

4. Query remote LD source (SPARQL on

Dbpedia)

5. How to embed LD in HTML (RDFa et al.)

6. Ponder things that are kinda sorta like LD

7. Recover!

Linked Data: What?

• “a set of best practices for publishing and connecting structured data on the web”

• Conceived by the guy who invented the WWW

• Web of Data

• Turns the web into a giant database

• With a single, consistent API

• Simple, elegant, familiar mechanism: URIs and

“typed links”

Linked Data: Why?

• For users: enables meaningful queries instead of just text string searches; research applications, consumer applications

• For creators: efficiency of not having to redundantly create and maintain data.

• One API for all data: this is a thing of beauty in itself.

Linked Data: How? (Mug version)

Linked Data: How? (Principles)

1. Use URIs as names for things.

2. Use HTTP URIs, so that people can look up those names.

3. When someone looks up a URI, provide useful information, using the standards (RDF,

SPARQL).

4. Include links to other URIs, so that they can discover more things.

Linked Data: How? (RE’s formulation)

1. Describe things using RDF triples

2. Identify things using HTTP URIs

3. Those URIs should link to more LD (that other people have already created, whenever possible)

Linked Data: How? (RE’s even shorter reformulation)

1. Describe with RDF

2. Identify with HTTP URIs

3. Link to more LD

1. Describe with RDF

• RDF = Resource Description Framework

• F stands for framework, not file type!

• It’s a conceptual model

• “content agnostic” (can describe anything)

• Describe things using 3 terms (“RDF triples”)

1. Subject

Fred

Fred

2. Predicate

Likes

Date of Birth

3. Object

Wilma

October 2, 1973

2. Identify with HTTP URIs

1. Subject and Predicate MUST be URIs

2. Object may be URI or raw value (number, text, date, etc.)

1. Subject

Fred http://s.org/fred http://s.org/fred

2. Predicate

Likes

3. Object

Wilma http://p.org/likes “Wilma” http://p.org/likes http://o.org/wilma

3. Link to more LD

• What format should the referenced LD be in?

• If I go to http://o.org/wilma , what should I see there?

• Are predicates in RDF too? http://p.org/likes

It’s SO EASY

(Why are you even here?)

And it will save the world!

(Why aren’t you making Linked Data NOW?)

Montage of leading LD sites

"At first glance, the principles of Linked Data seem simple enough.

However experienced Web developers, designers and architects who attempt to put these ideas into practice often find themselves having to digest and understand debates about Web architecture, the semantic web, artificial intelligence and the philosophical nature of identity.”

– Ed Summers & Dorothea Salo

“LD makes my brain hurt”

• It’s OK!

• Though core concepts are very simple

• It quickly gets confusing – it’s not just you

• Accidental: partially overlapping concepts.

• Intrinsic: simple parts make complex whole

• Danger: Is it too simple? (ambiguous)

“Make things as simple as possible, but not simpler.” – Einstein (paraphrased)

“External” Overlapping Concepts

• There are a lot of things that are kinda sorta like LD!

• Semantic Web (1994)

• Web APIs (10,214 and counting)

• Facebook Open Graph?

• Schema.org and microdata? (google, yahoo, microsoft)

• microformats

The Semantic Web

• Semantic Web: 1994

• “The vision of the Semantic Web is to extend principles of the Web from documents to data” – W3C

• “This simple idea [the Semantic Web]… remains largely unrealized.” – Tim Berners-Lee et al., 2006

LD and the Semantic Web

Is Linked Data (2006) a:

• Special case: narrowing and focusing?

• Redo: “The semantic web done right?”

• Addition: Semantic web + links?

• Rebranding of a troubled project?

“Internal” forms of confusion

• URI, URL, URN, IRI, CURIE

• RDF “Serializations”: RDF/XML, RDFa, N-

Triples, Turtle, JSON-LD

• Ontologies vs. ontology languages vs. “schema languages” vs. plain old RDF: RDFs, OWL, FOAF

• SPARQL

Let’s make some LD!

• 5 star LD!

• That means we need to link to other LD

• So we need to identify some existing LD to link to…

DBPedia

• Linked Data version of Wikipedia

• Take any wikipedia url

• Replace “en.wikipedia.org/wiki”

• With “dbpedia.org/page”

• And you have the LD expression of that concept.

Example: our fair city

• http://en.wikipedia.org/wiki/Cambridge,_Massachusetts

• http://dbpedia.org/page/Cambridge,_Massachusetts

An RDF triple (in HTML)

RDF Graphs

• A set of RDF triples is called a “graph”

• Graph in this sense is a math/comp sci data structure

• Not a visual plot

“provide useful information using the

standards…”

Cambridge in CSV (in excel)

2409 RDF Triples about Cambridge

Let’s make an LD “comment” about Cambridge!

1. Open the “ntriples” dbpedia file and find the existing English language comment

Using own id with sameAs

Subject

<http://mylinkeddata.org/r esource/123>

Predicate

<http://www.w3.org/2002

/07/owl#sameAs>

<http://mylinkeddata.org/r esource/123>

<http://www.w3.org/2000

/01/rdfschema#comment>

Object

<http://dbpedia.org/resou rce/Cambridge,_Massachu setts>

"Cambridge is a pretty cool town"@en

Stick this data:

5 Star LD!

At this URL: http://mylinkeddata.org/resource/123

Summary of LD creation

• Creating RDF triples is easy

• Figuring out the right HTTP URIs to use is hard

• Figuring out how to respond to any HTTP URI requests you receive is also harder than I would like

Querying LD with SPARQL

• SPARQL: Recursive acronym for SPARQL

Protocol and RDF Query Language

• RQL is the part we’re interested in

• LD’s answer to SQL

• Instead of querying tables in a db

• You query a graph of rdf triples

• Using “triple patterns” (and some other stuff)

Let’s query the DBPedia SPARQL endpoint!

Note: You want to point your browser to

“snorql” (not sparql!): http://dbpedia.org/snorql

Not the most user friendly site…

A Query (in English)

• Show me name and dates of birth and death for people whose “main interests” are theology and nihilism

Same Query (in SPARQL)

PREFIX foaf: http://xmlns.com/foaf/0.1/

PREFIX dbo: http://dbpedia.org/ontology/

PREFIX : http://dbpedia.org/resource/

SELECT ?name ?birth ?death ?person WHERE {

?person dbo:mainInterest :Nihilism .

?person dbo:mainInterest :Theology .

?person dbo:birthDate ?birth .

?person foaf:name ?name .

?person dbo:deathDate ?death .

}

ORDER BY ?name

http://bit.ly/1ip6leF

More sample queries

http://wiki.dbpedia.org/OnlineAccess#h28-5

Play around with them. Swap out some parameters. Stare at your favorite dbpedia records you found bymodifying wikipedia urls to get ideas for other triple patterns.

If you want to run SPARL against your own RDF data…

• Install apache Jena (java framework)

• Use the command line ARQ tool

• Warning: probably too geeky for most folks in this room.

• But if you’re serious about going deeper, probably unavoidable

SPARQL Summary

• SPARQL syntax harder than RDF

• But again, the hardest part seems to be figuring out what URIs to plug in

• Existing tools not very user friendly

• Promise of querying the entire Web of Data still a way off

RDFa

• Regular LD sort of a parallel web of data

• RDFa and related technologies embed web of data within the web of documents

• The “a” stands for attributes”

• Metatags on steroids

• But good, W3C doctor approved steroids!

• Sounds like an afterthought, but probably far more widely used than any other form of LD.

$$$ Rich Snippets $$$

What does RDFa look like under the hood?

http://en.wikipedia.org/wiki/RDFa

Facebook’s Open Graph API

• Graph? Sounds like LD!

• And indeed, uses RDFa

• But not “pure RDFa”

• And only for ingest

• http://graph.facebook.com/reinhard.engels

• http://graph.facebook.com/harvard

• http://graph.facebook.com/zuck

Summary of my LD experience

• Frustrated by ambiguities and the many competing ways of doing more or less the same thing

• Frustrated by disconnect between grand vision of one API for the Web of Data and the sorry little SPARQL queries I was able to run

• Not overjoyed that SEO spamming seems the one area in which LD is really succeeding

But…

• “We tend to overestimate the effect of a technology in the short run and underestimate the effect in the long run.” –

Amara’s Law

Download