Semantic Stew the perils and potential of linked open data The promise of linked open data, an element of the Semantic Web initiative, is that through the aggregation of simple statements about named entities, computers can make useful inferences that provide us with more and better information. Instead of being trapped within the unstructured Web of documents, information expressed as linked open data can be more easily synthesized, manipulated, searched, and otherwise processed by software applications, the potential fulfilment of Paul Otlet’s dream of a world knowledge base unencumbered by pesky document structure. A great benefit of the linked open data approach is its extreme simplicity. Linked open data relies on just a few basic elements: Unique identifiers (URIs, of which the URL is a subset) to distinguish entities, properties, and relations (things and concepts). RDF, the Resource Description Framework, a way to describe the world through the expression of triples (subjects, predicates, and objects). RDF triples are modeled as nodes and arcs in a graph. The subject and the object are nodes, and the predicate is a directed arc from the subject to the object. Here is an example from the W3C RDF Primer: This example describes a Web page (identified by a URI) as being created by a particular person (identified by a URI), as being in the English language (as identified by a string for the international code for English, en), and as having the creation date of August 16, 1999 (again as identified by a string). All the predicates are identified by URIs; two of the predicates are from the Dublin Core standard for basic metadata terms. This simplicity makes linked open data easy to generate and disseminate. However, with this simplicity comes the potential for semantic conflict, when the same object (as referenced with the same URI) is described in conflicting ways, or when the same property (as referenced with the same URI) is used in different ways to describe different objects. For example, the datastore collected by Dbpedia, which expresses Wikipedia infobox material as RDF, states that both Diana Ross and Amy Winehouse are associated with the Jazz genre. We can debate whether these statements accurately portray either Diana Ross, Amy Winehouse, or jazz. Too, it is theoretically possible to write a triple that associates Bugs Bunny with Jazz, or the stray cat that lives next door with Jazz. While it is possible to use more elaborate Semantic Web technologies, such as OWL, or the Web Ontology Language, to constrain meaning, this gets quite complicated quite fast, and the beautiful simplicity of basic RDF is lost. So should we just throw up our hands and declare linked open data an inevitable mess of catastrophic proportions? Well, maybe those semantic conflicts don’t really matter that much most of the time, especially with enough data out there. Determining which semantic conflicts do matter, in what contexts they matter, and what to do about it will be a key skill in the ultimate success of linked open data applications. (This is precisely the kind of task that you, my friends, should consider yourselves poised to undertake!) In this game, we will see how this kind of situation might play out. In groups, you will create triples using defined sets of subjects, predicates, and objects, and you will draw your suite of triples as a graph. Then you will see how your graphs integrate with other groups, some of whom received the same set of cards as your group, and some of whom received slightly different cards. Is there semantic conflict? Does it matter? Step 1. Create triples and draw them as a graph. (25 minutes) Your group will receive a set of index cards. Your cards will either be marked with a P or an O (for Paul Otlet, historic precursor figure to the Semantic Web). Each set of cards includes a group of subjects, a group of objects, and a predicate or two. For each subject, create triples using each appropriate predicate/object combination. Draw your triples on the blank paper provided. Try to get all the triples for your set on a single sheet of paper, so that you can show the different subjects linked to a single object (like Diana Ross and Amy Winehouse for Jazz). Step 2. Integrate your triples with another group of the same letter. (20 minutes) If your cards were marked with a P, find another P group; if your cards were marked with an O, find another O group. Draw another graph that aggregates the statements both groups have made. Is there semantic conflict in the resulting graph? Does it matter? If it does, how would you handle it? Step 3. Integrate your triples with a group of the different letter. (20 minutes) If your cards were marked with a P, find an O group; if your cards were marked with an O, find a P group. Draw another graph that aggregates the statements both groups have made. Is there semantic conflict in the resulting graph? Does it matter? If it does, how would you handle it?