Game: Semantic stew

advertisement
Semantic Stew
the perils and potential of linked open data
The promise of linked open data, an element of the Semantic Web initiative, is that through the
aggregation of simple statements about named entities, computers can make useful inferences that provide
us with more and better information. Instead of being trapped within the unstructured Web of documents,
information expressed as linked open data can be more easily synthesized, manipulated, searched, and
otherwise processed by software applications, the potential fulfilment of Paul Otlet’s dream of a world
knowledge base unencumbered by pesky document structure.
A great benefit of the linked open data approach is its extreme simplicity. Linked open data relies on just
a few basic elements:
 Unique identifiers (URIs, of which the URL is a subset) to distinguish entities, properties, and
relations (things and concepts).
 RDF, the Resource Description Framework, a way to describe the world through the expression
of triples (subjects, predicates, and objects).
RDF triples are modeled as nodes and arcs in a graph. The subject and the object are nodes, and the
predicate is a directed arc from the subject to the object. Here is an example from the W3C RDF Primer:
This example describes a Web page (identified by a URI) as being created by a particular person
(identified by a URI), as being in the English language (as identified by a string for the international code
for English, en), and as having the creation date of August 16, 1999 (again as identified by a string). All
the predicates are identified by URIs; two of the predicates are from the Dublin Core standard for basic
metadata terms.
This simplicity makes linked open data easy to generate and disseminate. However, with this simplicity
comes the potential for semantic conflict, when the same object (as referenced with the same URI) is
described in conflicting ways, or when the same property (as referenced with the same URI) is used in
different ways to describe different objects. For example, the datastore collected by Dbpedia, which
expresses Wikipedia infobox material as RDF, states that both Diana Ross and Amy Winehouse are
associated with the Jazz genre. We can debate whether these statements accurately portray either Diana
Ross, Amy Winehouse, or jazz. Too, it is theoretically possible to write a triple that associates Bugs
Bunny with Jazz, or the stray cat that lives next door with Jazz.
While it is possible to use more elaborate Semantic Web technologies, such as OWL, or the Web
Ontology Language, to constrain meaning, this gets quite complicated quite fast, and the beautiful
simplicity of basic RDF is lost.
So should we just throw up our hands and declare linked open data an inevitable mess of catastrophic
proportions? Well, maybe those semantic conflicts don’t really matter that much most of the time,
especially with enough data out there. Determining which semantic conflicts do matter, in what contexts
they matter, and what to do about it will be a key skill in the ultimate success of linked open data
applications. (This is precisely the kind of task that you, my friends, should consider yourselves poised to
undertake!)
In this game, we will see how this kind of situation might play out. In groups, you will create triples
using defined sets of subjects, predicates, and objects, and you will draw your suite of triples as a graph.
Then you will see how your graphs integrate with other groups, some of whom received the same set of
cards as your group, and some of whom received slightly different cards. Is there semantic conflict? Does
it matter?
Step 1. Create triples and draw them as a graph. (25 minutes)
Your group will receive a set of index cards. Your cards will either be marked with a P or an O (for Paul
Otlet, historic precursor figure to the Semantic Web).
Each set of cards includes a group of subjects, a group of objects, and a predicate or two. For each
subject, create triples using each appropriate predicate/object combination. Draw your triples on the
blank paper provided. Try to get all the triples for your set on a single sheet of paper, so that you can
show the different subjects linked to a single object (like Diana Ross and Amy Winehouse for Jazz).
Step 2. Integrate your triples with another group of the same letter. (20 minutes)
If your cards were marked with a P, find another P group; if your cards were marked with an O, find
another O group.
Draw another graph that aggregates the statements both groups have made. Is there semantic conflict in
the resulting graph? Does it matter? If it does, how would you handle it?
Step 3. Integrate your triples with a group of the different letter. (20 minutes)
If your cards were marked with a P, find an O group; if your cards were marked with an O, find a P
group.
Draw another graph that aggregates the statements both groups have made. Is there semantic conflict in
the resulting graph? Does it matter? If it does, how would you handle it?
Download