Running head: CONSIDERING RDF 1 Considering RDF Erin R. Clark San José State University Author Note Erin R. Clark, School of Library and Information Science, San José State University. This paper was written in partial fulfillment of the requirements for Seminar in Contemporary Issues: Metadata, a graduate course offered at San José State University’s School of Library and Information Science. Correspondence concerning this paper should be directed to Erin Clark, a student of the School of Library and Information Science, San José State University, San José, CA 95192. E-mail: erin.clark@sjsu.desire2learn.com CONSIDERING RDF 2 Abstract Resource Description Framework, commonly known as RDF, is a language used to conceptually represent information about resources that can be identified by a URI (uniform resource identifier); it is especially useful for representing the metadata of Web resources. This paper lays out the basics of RDF, including the history and purpose of this data model, related schemes, some implementations, helpful literature and documentation materials, a general description of the framework, discussion on how it can be used, and a consideration of its strengths and weaknesses. CONSIDERING RDF 3 Considering RDF The Resource Description Framework (RDF) is a general-purpose language that was originally designed as a conceptual data model for representing information about entities and relationships present in resources found or identifiable on the World Wide Web. This paper gives an overview of RDF, beginning with the history and purpose of this data model, but also covering some related schemes and a few implementations. A brief review of some helpful literature and available documentation materials is followed by a general description of the framework and discussion on how it can be used. Consideration of RDF’s strengths and weaknesses concludes the paper. History and Purpose After working on their specification of RDF for about two years beginning in 1997, the World Wide Web Consortium (W3C) published their first recommendation (Lassila & Swick, 1999). This effort was inspired by work done by Ramanathan Guha and Tim Bray to combine Guha’s Meta Content Framework (MCF), a specification involving structuring metadata about web sites and other data in the form of objects attached to properties, with a human- and machine-readable encoding format known as Extensible Markup Language (XML) (Bray, 2003, The history of RDF). Brickley (2001) lists Dublin Core metadata and the Warwick Framework, an architecture enabling the exchange of metadata across different systems, as other important influences. These different strands were woven together with the specification for a Uniform Resource Identifier (URI), or string identifying the name of Web resource, by a working group that later became the RDF Model and Syntax Working Group. This was a lot to cover in a single document, however, and the original RDF specification from 1999 was updated and jointly replaced by a set of six new documents (Powers, 2009). CONSIDERING RDF 4 The purpose of RDF is to provide a way for statements to be connected across Web resources that are built separately and in different systems; it provides a model in which vocabularies in XML can be merged with other vocabularies. RDF enables the aggregation of meaning and the encoding of knowledge, which allows for data interchange on the Web in a way that can be understood by software. According to Tauberer (n.d.), the RDF use cases include when it is desirable to integrate data across different sources without the need for custom programming, when the data is being offered to other parties for reuse, when the data is distributed, or for building upon when a tool to do “something fancy with large amounts of data” that is not tied to any proprietary technology is needed. In addition to the metadata schemes and standards mentioned above as being ancestrally related to RDF, other related standards include RDF Schema (RDFS), Web Ontology Language (OWL), and RDF in Attributes (RDFa). The functionality established through the use of these standards is the necessary base upon which the Semantic Web is built. RDF has been used as a way to implement the description of web resources using Dublin Core elements for specifying the nature of relationships. A couple of applications made possible by RDF include Friend of a Friend (FOAF), which describes relationships people have with other people or entities – such as jobs – and builds up vast networks of connections that are machine-interpretable (Dumbill, 2002), and some versions of RSS known as RDF Site Summary in which a website summary information was connected to a link, later used as a syndication format (RSS-DEV Working Group, 2000). RDF could also be used for linking enterprise data (Hyland, 2010, p. 61). Literature Review There is a wealth of information about RDF to be found online. The World Wide Web Consortium (W3C) hosts and maintains the six documents making up the 2004 specification as a CONSIDERING RDF 5 W3C Recommendation, the most mature stage of development for a standard. First, there is a lengthy but informal primer (Manola & Miller, 2004) that covers a smattering of the material available in some of the other documents as it describes RDF and RDF Schema from both the conceptual side and the more technical notational side, along with some RDF applications. Another document describes the fundamental nature of the RDF framework and ties the graph data model to abstract specifications of the syntax (Klyne & Carroll, 2004); due to its conceptual nature, this document may provide the easiest starting point for readers who are unfamiliar with XML. Beckett (2004) provides a more detailed specification of the recommended serialization format for RDF: RDF/XML, which is likely to be a little more difficult to understand for those who are not particularly technical. Also highly technical, Hayes (2004) gives a precise specification of the semantics and inference rules used by RDF and RDFS. RDF Schema, a vocabulary description language that semantically extends the basic framework of RDF with resource classes and relational properties, is documented more conceptually by Brickley and Guha (2004). Finally, Grant and Beckett (2004) provide a test suite for RDF that goes over types of tests as well as a listing of approved test cases for use by RDF implementers. Together, these documents replace Lassila and Swick’s original 1999 specification of the data model and the syntax of RDF. They are probably also some of the most useful documents for gaining a complete understanding of RDF. Sikos (n.d.) gives a useful tutorial covering the most important points on RDF, RDFS, and RDF/XML and two non-XML serializations, or notational syntaxes, known as N3 and Turtle. Description of the Scheme What is RDF? It is essentially a data model representing statements about Web resources as a “[directed and labeled] graph of nodes and arcs representing the resources, and their properties CONSIDERING RDF 6 and values” (Manola & Miller, 2004, Introduction, para. 3). Since RDF uses URIs rather than URLs, it is even possible to represent information about resources that can be identified but not retrieved on the Web. In RDF, these identifiable resources can be described as having properties which have values by making statements. Statements are constructed out of subjects, predicates, and objects; these three pieces comprise a triple. Rather intuitively, subjects are what the statement is about, predicates are the properties or characteristics of the subjects that are specified by the statement, and objects are the values of those properties. Statements can be modeled by graphs having a constituent known as the subject node, the predicate (or property) relationship that is an edge pointing from the subject to the object, and the object node. The simplest version of this type of graph would like this: Figure 1. A simple RDF graph (from Klyne & Carroll, 2004) What each of these three parts represents might potentially be identified by a URI reference (URIref), although object nodes could be constant values called literals instead. Additionally, blank subject or object nodes can be inserted if needed to aggregate concepts and change n-ary relationships into binary relationships, like in the case where it is desirable to break structured values into constituent parts. This strategy could also be used in cases of metonymy or for reified triples that involve statements about statements. An identifier for a blank node would not actually be considered part of the graph, but it would be a distinct blank node identifier that only has significance within the triples representing a single graph. CONSIDERING RDF 7 Figure 2. Using a blank node in a graph (from Manola & Miller, 2004). Statements are also representable with what is known as RDF triples in which the first part of the triple encodes the subject, the second part would be the predicate, and the object would make up the final portion. The triples corresponding to graph in Figure 2 would look like the following using QName shorthand for the URIrefs: exstaff:85740 _:johnaddress _:johnaddress _:johnaddress _:johnaddress exterms:address exterms:street exterms:city exterms:state exterms:postalCode _:johnaddress . "1501 Grant Avenue" . "Bedford" . "Massachusetts" . "01730" . Here, exterms is the prefix portion of a QName or XML Qualified Name, which has been assigned to a namespace URI. The full QName is formed by following the prefix with a colon and then a local name. In this case then, exterms: specifies the namespace http://www.example.org/terms/, and exterms:address would be shorthand for the URIref http://www.example.org/term/address. The set of URIrefs that are defined for specific purposes – within a particular namespace – can be called a vocabulary. In this above example, then, the namespace for terms at CONSIDERING RDF 8 example.org would include the vocabulary items of address, street, city, state, and postalCode. Multiple vocabularies can be accessed within a statement, and these can be user-determined or come from well-defined namespaces, such as Dublin Core’s XML namespace where the DC elements make up the vocabulary. For example, the QName dc:creator would refer to Dublin Core’s vocabulary element “creator” found in the namespace http://purl.org/dc/elements/1.1/. This feature allows for connections across systems and namespaces, enabling interoperability. Namespaces allow for even more adaptability than just allowing for other predefined and established vocabulary sets to be used: they make it possible for different users and systems to define and use metadata that fits their individual needs. This allows for great extensibility of the metadata. While RDF as described above makes it possible to express statements about resources with named properties and values, much greater granularity and specificity might be desired by the user communities. To go beyond this level, RDF and RDFS offer classes and properties described by Brickley and Guha (2004). Classes provide a way to specify categories of things. The rdf:Statement class is unsurprisingly the class of statements, and rdf:Property is the class of properties. The class rdf:XMLLiteral specifies something belongs to the category of XML literal values, which can be used for text that may have markup. RDF has classes of “containers” represented by rdf:Bag, rdf: Seq, and rdf:Alt, which can be used for specifying groups that can be differentiated as unordered, ordered, or a group from which to select one (where the default is the first member), respectively. The class of RDF lists belongs to the closed collection vocabulary, which is specified by rdf:List; rdf:nil is an empty list. RDFS adds further possibilities for describing RDF’s vocabulary, including rdfs:Class for the class of classes which even allows for the CONSIDERING RDF 9 creation of classes of vocabulary terms, rdfs:Resource for the class of resources, and rdfs:Datatype for the class of datatypes. The class of plain or typed literal values, including strings and integers, is specified by rdfs:Literal; typed literals are instances of the datatype class. RDFS also offers rdfs:Containers as the class of RDF containers, which means that it is a superclass to the three classes of RDF containers. Finally, rdfs:ContainerMembershipProperties is a class of container membership properties used to state a resource is a member of a container. It is therefore a subclass of rdf:Property and its instances are sub properties of members. Brickley and Guha (2004) also go over the available properties in RDF and RDFS. For RDF, these include rdf:subject, rdf:predicate, and rdf:object, which specify the subject, predicate, and object of an RDF statement used as a subject; they are used with rdf:Statement for reification (where statements are resources and statements can be made about them). Structured values have a property rdf:value available for use, even though it has no meaning on its own. A property (and instance of rdf:Property) called rdf:type can be used to indicate a resource belongs to a class. The properties of being the first item in the list used as an RDF subject, or else the rest of the list following the first, can be specified by rdf:first and rdf:rest and used to build or describe ordered lists. RDFS also has several properties specified, including rdfs:label, which is used for indicating the object is a human-readable label for the resource, and rdfs:comment, used for indicating that the object is a human-readable description of the resource. Other RDFS properties include rdfs:seeAlso and rdfs:isDefinedBy, which have to do with indicating which resource could be a source of additional information or which resource defines the subject resource. The domain property rdfs:domain is used to describe the class of possible subjects of the statement triple, and rdfs:range describes the possible values for the range of the triple. Members of the subject resource can be indicated with rdfs:member, which is both an instance of CONSIDERING RDF 10 rdfs:Property and a super-property of rdfs:subPropertyOf, which in turn states that all resources related by one property are also related by another (setting the stage for entailment in properties). Likewise, rdfs:subClassOf is an instance of rdf:Property indicating that all instances of one class are instances of another, this time enabling entailment of classes. These many classes and properties allow for metadata to be created around the metadata vocabulary used. This includes the organization of metadata into hierarchies supporting entailment, and promoting greater semantic exchange. For an example of a class hierarchy, see the Appendix. Conclusion RDF is fairly simple as a conceptual data model, representing statements about resources as a directed and labeled graph with resources or literals as the nodes and properties as the edges. Still, some may find that the syntax notations or serialization formats involved in implementing RDF can get complex (Beckett, 2004) and easily become very difficult to write out by hand as the underlying graph grows with increasing data. Tim Bray, one of the authors of the original XML specification and a one-time member of the RDF Working Group who helped invent the syntax, says: Speaking only for myself, I have never actually managed to write down a chunk of RDF/XML correctly, even when I had the triples laid out quite clearly in my head. Furthermore—once again speaking for myself—I find most existing RDF/XML entirely unreadable. And I think I understand the theory reasonably well. (2003, How to fix it) This makes it very desirable to use a web service such as Calais (see http://viewer.opencalais.com/ for a demonstration of an automated service). Bray (2003) draws an unfavorable parallel between this situation and the early days of the Web, however, saying, CONSIDERING RDF 11 “If, in 1994, you'd needed DreamWeaver or equivalent to write for the Web, there wouldn't be a Web today.” Although RDF itself is a rather general framework for representing and exchanging information about resources on the Web, it can utilize classes and properties provided for in RDFS (Brickley & Guha, 2004; Hayes, 2004) to describe the vocabulary terms in more granular ways. It is also very adaptable due to its extensibility, or ability to incorporate and use existing well-described metadata schemas such as Dublin Core and their standard vocabularies or even allow for the definition of new system-specific metadata vocabularies in order to specify the nature of relationships using any terms desired. Additionally, RDF can easily relate resources specified within and across websites even when these sites use different metadata vocabularies, making it a framework that enables interoperability. This combination of desirable qualities yields a powerful framework for building and sharing knowledge across different sites and systems. RDF clearly has both strengths and weaknesses. Only time will tell if this enough to make the RDF model the way forward, or if the difficulty of the syntax will prove too problematic. CONSIDERING RDF 12 References Beckett, D. (Ed.). (2004). RDF/XML syntax specification. Retrieved from World Wide Web Consortium website: http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/ Bray, T. (2003, May 21). The RDF.net challenge [Web log post]. Retrieved from http://www.tbray.org/ongoing/When/200x/2003/05/21/RDFNet Brickley, D. (2001). Understanding RDF. Retrieved from http://ilrt.org/discovery/2001/01/understanding-rdf/ Brickley, D. & Guha, R. V. (Eds.). (2004). RDF vocabulary description language 1.0: RDF Schema. Retrieved from World Wide Web Consortium website: http://www.w3.org/TR/2004/REC-rdf-mt-20040210/ Dumbill, E. (Ed.). (2002). XML watch: finding friends with XML and RDF. Retrieved from http://www.ibm.com/developerworks/xml/library/x-foaf/index.html Grant, J. & Beckett D. (Eds.). RDF test cases. Retrieved from World Wide Web Consortium website: http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/ Hayes, P. (Ed.) (2004). RDF semantics. Retrieved from World Wide Web Consortium website: http://www.w3.org/TR/2004/REC-rdf-mt-20040210/ Hyland, B. (2010). Preparing for a linked data enterprise. In Wood, D. (Ed.), Linking enterprise data. (pp. 51-64). New York: NY: Springer. Klyne, G. & Carroll, J. J. (Eds.). (2004). Resource Description Framework (RDF): concepts and abstract syntax. Retrieved from World Wide Web Consortium website: http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ CONSIDERING RDF 13 Lassila, O. & Swick, R. R. (Eds.). (1999). Resource Description Framework (RDF) model and syntax specification. Retrieved from World Wide Web Consortium website: http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/ Manola, F., & Miller, E. (Eds.). (2004). RDF primer. Retrieved from World Wide Web Consortium website: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ Powers, S. (2009). Practical RDF [Kindle version]. RSS-DEV Working Group. (2000). RDF Site Summary (RSS) 1.0. Retrieved from http://web.resource.org/rss/1.0/spec Sikos, L. (n.d.). The Resource Description Framework (RDF). Retrieved from http://www.lesliesikos.com/tutorials/rdf/ Tauberer, J. (n.d.) Quick intro to RDF. Retrieved from http://www.rdfabout.com/quickintro.xpd CONSIDERING RDF 14 Appendix A vehicle class schema represented in graph form (from Manola & Miller, 2004) CONSIDERING RDF Below is the RDF/XML serialization for the same vehicle schema. <?xml version="1.0"?> <!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xml:base="http://example.org/schemas/vehicles"> <rdf:Description rdf:ID="MotorVehicle"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> </rdf:Description> <rdf:Description rdf:ID="PassengerVehicle"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="#MotorVehicle"/> </rdf:Description> <rdf:Description rdf:ID="Truck"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="#MotorVehicle"/> </rdf:Description> <rdf:Description rdf:ID="Van"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="#MotorVehicle"/> </rdf:Description> <rdf:Description rdf:ID="MiniVan"> <rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/> <rdfs:subClassOf rdf:resource="#Van"/> <rdfs:subClassOf rdf:resource="#PassengerVehicle"/> </rdf:Description> </rdf:RDF> 15