View/Open

advertisement
Running head: CONSIDERING RDF
1
Considering RDF
Erin R. Clark
San José State University
Author Note
Erin R. Clark, School of Library and Information Science, San José State University.
This paper was written in partial fulfillment of the requirements for Seminar in
Contemporary Issues: Metadata, a graduate course offered at San José State University’s School
of Library and Information Science.
Correspondence concerning this paper should be directed to Erin Clark, a student of the
School of Library and Information Science, San José State University, San José, CA 95192.
E-mail: erin.clark@sjsu.desire2learn.com
CONSIDERING RDF
2
Abstract
Resource Description Framework, commonly known as RDF, is a language used to conceptually
represent information about resources that can be identified by a URI (uniform resource
identifier); it is especially useful for representing the metadata of Web resources. This paper lays
out the basics of RDF, including the history and purpose of this data model, related schemes,
some implementations, helpful literature and documentation materials, a general description of
the framework, discussion on how it can be used, and a consideration of its strengths and
weaknesses.
CONSIDERING RDF
3
Considering RDF
The Resource Description Framework (RDF) is a general-purpose language that was
originally designed as a conceptual data model for representing information about entities and
relationships present in resources found or identifiable on the World Wide Web. This paper
gives an overview of RDF, beginning with the history and purpose of this data model, but also
covering some related schemes and a few implementations. A brief review of some helpful
literature and available documentation materials is followed by a general description of the
framework and discussion on how it can be used. Consideration of RDF’s strengths and
weaknesses concludes the paper.
History and Purpose
After working on their specification of RDF for about two years beginning in 1997, the
World Wide Web Consortium (W3C) published their first recommendation (Lassila & Swick,
1999). This effort was inspired by work done by Ramanathan Guha and Tim Bray to combine
Guha’s Meta Content Framework (MCF), a specification involving structuring metadata about
web sites and other data in the form of objects attached to properties, with a human- and
machine-readable encoding format known as Extensible Markup Language (XML) (Bray, 2003,
The history of RDF). Brickley (2001) lists Dublin Core metadata and the Warwick Framework,
an architecture enabling the exchange of metadata across different systems, as other important
influences. These different strands were woven together with the specification for a Uniform
Resource Identifier (URI), or string identifying the name of Web resource, by a working group
that later became the RDF Model and Syntax Working Group. This was a lot to cover in a single
document, however, and the original RDF specification from 1999 was updated and jointly
replaced by a set of six new documents (Powers, 2009).
CONSIDERING RDF
4
The purpose of RDF is to provide a way for statements to be connected across Web
resources that are built separately and in different systems; it provides a model in which
vocabularies in XML can be merged with other vocabularies. RDF enables the aggregation of
meaning and the encoding of knowledge, which allows for data interchange on the Web in a way
that can be understood by software. According to Tauberer (n.d.), the RDF use cases include
when it is desirable to integrate data across different sources without the need for custom
programming, when the data is being offered to other parties for reuse, when the data is
distributed, or for building upon when a tool to do “something fancy with large amounts of data”
that is not tied to any proprietary technology is needed. In addition to the metadata schemes and
standards mentioned above as being ancestrally related to RDF, other related standards include
RDF Schema (RDFS), Web Ontology Language (OWL), and RDF in Attributes (RDFa). The
functionality established through the use of these standards is the necessary base upon which the
Semantic Web is built. RDF has been used as a way to implement the description of web
resources using Dublin Core elements for specifying the nature of relationships. A couple of
applications made possible by RDF include Friend of a Friend (FOAF), which describes
relationships people have with other people or entities – such as jobs – and builds up vast
networks of connections that are machine-interpretable (Dumbill, 2002), and some versions of
RSS known as RDF Site Summary in which a website summary information was connected to a
link, later used as a syndication format (RSS-DEV Working Group, 2000). RDF could also be
used for linking enterprise data (Hyland, 2010, p. 61).
Literature Review
There is a wealth of information about RDF to be found online. The World Wide Web
Consortium (W3C) hosts and maintains the six documents making up the 2004 specification as a
CONSIDERING RDF
5
W3C Recommendation, the most mature stage of development for a standard. First, there is a
lengthy but informal primer (Manola & Miller, 2004) that covers a smattering of the material
available in some of the other documents as it describes RDF and RDF Schema from both the
conceptual side and the more technical notational side, along with some RDF applications.
Another document describes the fundamental nature of the RDF framework and ties the graph
data model to abstract specifications of the syntax (Klyne & Carroll, 2004); due to its conceptual
nature, this document may provide the easiest starting point for readers who are unfamiliar with
XML. Beckett (2004) provides a more detailed specification of the recommended serialization
format for RDF: RDF/XML, which is likely to be a little more difficult to understand for those
who are not particularly technical. Also highly technical, Hayes (2004) gives a precise
specification of the semantics and inference rules used by RDF and RDFS. RDF Schema, a
vocabulary description language that semantically extends the basic framework of RDF with
resource classes and relational properties, is documented more conceptually by Brickley and
Guha (2004). Finally, Grant and Beckett (2004) provide a test suite for RDF that goes over types
of tests as well as a listing of approved test cases for use by RDF implementers. Together, these
documents replace Lassila and Swick’s original 1999 specification of the data model and the
syntax of RDF. They are probably also some of the most useful documents for gaining a
complete understanding of RDF. Sikos (n.d.) gives a useful tutorial covering the most important
points on RDF, RDFS, and RDF/XML and two non-XML serializations, or notational syntaxes,
known as N3 and Turtle.
Description of the Scheme
What is RDF? It is essentially a data model representing statements about Web resources as a
“[directed and labeled] graph of nodes and arcs representing the resources, and their properties
CONSIDERING RDF
6
and values” (Manola & Miller, 2004, Introduction, para. 3). Since RDF uses URIs rather than
URLs, it is even possible to represent information about resources that can be identified but not
retrieved on the Web. In RDF, these identifiable resources can be described as having properties
which have values by making statements. Statements are constructed out of subjects, predicates,
and objects; these three pieces comprise a triple. Rather intuitively, subjects are what the
statement is about, predicates are the properties or characteristics of the subjects that are
specified by the statement, and objects are the values of those properties. Statements can be
modeled by graphs having a constituent known as the subject node, the predicate (or property)
relationship that is an edge pointing from the subject to the object, and the object node. The
simplest version of this type of graph would like this:
Figure 1. A simple RDF graph (from Klyne & Carroll, 2004)
What each of these three parts represents might potentially be identified by a URI reference
(URIref), although object nodes could be constant values called literals instead. Additionally,
blank subject or object nodes can be inserted if needed to aggregate concepts and change n-ary
relationships into binary relationships, like in the case where it is desirable to break structured
values into constituent parts. This strategy could also be used in cases of metonymy or for
reified triples that involve statements about statements. An identifier for a blank node would not
actually be considered part of the graph, but it would be a distinct blank node identifier that only
has significance within the triples representing a single graph.
CONSIDERING RDF
7
Figure 2. Using a blank node in a graph (from Manola & Miller, 2004).
Statements are also representable with what is known as RDF triples in which the first
part of the triple encodes the subject, the second part would be the predicate, and the object
would make up the final portion. The triples corresponding to graph in Figure 2 would look like
the following using QName shorthand for the URIrefs:
exstaff:85740
_:johnaddress
_:johnaddress
_:johnaddress
_:johnaddress
exterms:address
exterms:street
exterms:city
exterms:state
exterms:postalCode
_:johnaddress .
"1501 Grant Avenue" .
"Bedford" .
"Massachusetts" .
"01730" .
Here, exterms is the prefix portion of a QName or XML Qualified Name, which has been
assigned to a namespace URI. The full QName is formed by following the prefix with a colon
and then a local name. In this case then, exterms: specifies the namespace
http://www.example.org/terms/, and exterms:address would be shorthand for the URIref
http://www.example.org/term/address.
The set of URIrefs that are defined for specific purposes – within a particular namespace
– can be called a vocabulary. In this above example, then, the namespace for terms at
CONSIDERING RDF
8
example.org would include the vocabulary items of address, street, city, state, and postalCode.
Multiple vocabularies can be accessed within a statement, and these can be user-determined or
come from well-defined namespaces, such as Dublin Core’s XML namespace where the DC
elements make up the vocabulary. For example, the QName dc:creator would refer to Dublin
Core’s vocabulary element “creator” found in the namespace http://purl.org/dc/elements/1.1/.
This feature allows for connections across systems and namespaces, enabling interoperability.
Namespaces allow for even more adaptability than just allowing for other predefined and
established vocabulary sets to be used: they make it possible for different users and systems to
define and use metadata that fits their individual needs. This allows for great extensibility of the
metadata.
While RDF as described above makes it possible to express statements about resources
with named properties and values, much greater granularity and specificity might be desired by
the user communities. To go beyond this level, RDF and RDFS offer classes and properties
described by Brickley and Guha (2004).
Classes provide a way to specify categories of things. The rdf:Statement class is
unsurprisingly the class of statements, and rdf:Property is the class of properties. The class
rdf:XMLLiteral specifies something belongs to the category of XML literal values, which can be
used for text that may have markup. RDF has classes of “containers” represented by rdf:Bag,
rdf: Seq, and rdf:Alt, which can be used for specifying groups that can be differentiated as
unordered, ordered, or a group from which to select one (where the default is the first member),
respectively. The class of RDF lists belongs to the closed collection vocabulary, which is
specified by rdf:List; rdf:nil is an empty list. RDFS adds further possibilities for describing
RDF’s vocabulary, including rdfs:Class for the class of classes which even allows for the
CONSIDERING RDF
9
creation of classes of vocabulary terms, rdfs:Resource for the class of resources, and
rdfs:Datatype for the class of datatypes. The class of plain or typed literal values, including
strings and integers, is specified by rdfs:Literal; typed literals are instances of the datatype class.
RDFS also offers rdfs:Containers as the class of RDF containers, which means that it is a
superclass to the three classes of RDF containers. Finally, rdfs:ContainerMembershipProperties
is a class of container membership properties used to state a resource is a member of a container.
It is therefore a subclass of rdf:Property and its instances are sub properties of members.
Brickley and Guha (2004) also go over the available properties in RDF and RDFS. For
RDF, these include rdf:subject, rdf:predicate, and rdf:object, which specify the subject, predicate,
and object of an RDF statement used as a subject; they are used with rdf:Statement for reification
(where statements are resources and statements can be made about them). Structured values
have a property rdf:value available for use, even though it has no meaning on its own. A
property (and instance of rdf:Property) called rdf:type can be used to indicate a resource belongs
to a class. The properties of being the first item in the list used as an RDF subject, or else the
rest of the list following the first, can be specified by rdf:first and rdf:rest and used to build or
describe ordered lists. RDFS also has several properties specified, including rdfs:label, which is
used for indicating the object is a human-readable label for the resource, and rdfs:comment, used
for indicating that the object is a human-readable description of the resource. Other RDFS
properties include rdfs:seeAlso and rdfs:isDefinedBy, which have to do with indicating which
resource could be a source of additional information or which resource defines the subject
resource. The domain property rdfs:domain is used to describe the class of possible subjects of
the statement triple, and rdfs:range describes the possible values for the range of the triple.
Members of the subject resource can be indicated with rdfs:member, which is both an instance of
CONSIDERING RDF
10
rdfs:Property and a super-property of rdfs:subPropertyOf, which in turn states that all resources
related by one property are also related by another (setting the stage for entailment in properties).
Likewise, rdfs:subClassOf is an instance of rdf:Property indicating that all instances of one class
are instances of another, this time enabling entailment of classes.
These many classes and properties allow for metadata to be created around the metadata
vocabulary used. This includes the organization of metadata into hierarchies supporting
entailment, and promoting greater semantic exchange. For an example of a class hierarchy, see
the Appendix.
Conclusion
RDF is fairly simple as a conceptual data model, representing statements about resources
as a directed and labeled graph with resources or literals as the nodes and properties as the edges.
Still, some may find that the syntax notations or serialization formats involved in implementing
RDF can get complex (Beckett, 2004) and easily become very difficult to write out by hand as
the underlying graph grows with increasing data. Tim Bray, one of the authors of the original
XML specification and a one-time member of the RDF Working Group who helped invent the
syntax, says:
Speaking only for myself, I have never actually managed to write down a chunk of
RDF/XML correctly, even when I had the triples laid out quite clearly in my head.
Furthermore—once again speaking for myself—I find most existing RDF/XML entirely
unreadable. And I think I understand the theory reasonably well. (2003, How to fix it)
This makes it very desirable to use a web service such as Calais (see
http://viewer.opencalais.com/ for a demonstration of an automated service). Bray (2003) draws
an unfavorable parallel between this situation and the early days of the Web, however, saying,
CONSIDERING RDF
11
“If, in 1994, you'd needed DreamWeaver or equivalent to write for the Web, there wouldn't be a
Web today.”
Although RDF itself is a rather general framework for representing and exchanging
information about resources on the Web, it can utilize classes and properties provided for in
RDFS (Brickley & Guha, 2004; Hayes, 2004) to describe the vocabulary terms in more granular
ways. It is also very adaptable due to its extensibility, or ability to incorporate and use existing
well-described metadata schemas such as Dublin Core and their standard vocabularies or even
allow for the definition of new system-specific metadata vocabularies in order to specify the
nature of relationships using any terms desired. Additionally, RDF can easily relate resources
specified within and across websites even when these sites use different metadata vocabularies,
making it a framework that enables interoperability. This combination of desirable qualities
yields a powerful framework for building and sharing knowledge across different sites and
systems.
RDF clearly has both strengths and weaknesses. Only time will tell if this enough to
make the RDF model the way forward, or if the difficulty of the syntax will prove too
problematic.
CONSIDERING RDF
12
References
Beckett, D. (Ed.). (2004). RDF/XML syntax specification. Retrieved from World Wide Web
Consortium website: http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/
Bray, T. (2003, May 21). The RDF.net challenge [Web log post]. Retrieved from
http://www.tbray.org/ongoing/When/200x/2003/05/21/RDFNet
Brickley, D. (2001). Understanding RDF. Retrieved from
http://ilrt.org/discovery/2001/01/understanding-rdf/
Brickley, D. & Guha, R. V. (Eds.). (2004). RDF vocabulary description language 1.0: RDF
Schema. Retrieved from World Wide Web Consortium website:
http://www.w3.org/TR/2004/REC-rdf-mt-20040210/
Dumbill, E. (Ed.). (2002). XML watch: finding friends with XML and RDF. Retrieved from
http://www.ibm.com/developerworks/xml/library/x-foaf/index.html
Grant, J. & Beckett D. (Eds.). RDF test cases. Retrieved from World Wide Web Consortium
website: http://www.w3.org/TR/2004/REC-rdf-testcases-20040210/
Hayes, P. (Ed.) (2004). RDF semantics. Retrieved from World Wide Web Consortium website:
http://www.w3.org/TR/2004/REC-rdf-mt-20040210/
Hyland, B. (2010). Preparing for a linked data enterprise. In Wood, D. (Ed.), Linking
enterprise data. (pp. 51-64). New York: NY: Springer.
Klyne, G. & Carroll, J. J. (Eds.). (2004). Resource Description Framework (RDF): concepts
and abstract syntax. Retrieved from World Wide Web Consortium website:
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
CONSIDERING RDF
13
Lassila, O. & Swick, R. R. (Eds.). (1999). Resource Description Framework (RDF) model and
syntax specification. Retrieved from World Wide Web Consortium website:
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
Manola, F., & Miller, E. (Eds.). (2004). RDF primer. Retrieved from World Wide Web
Consortium website: http://www.w3.org/TR/2004/REC-rdf-primer-20040210/
Powers, S. (2009). Practical RDF [Kindle version].
RSS-DEV Working Group. (2000). RDF Site Summary (RSS) 1.0. Retrieved from
http://web.resource.org/rss/1.0/spec
Sikos, L. (n.d.). The Resource Description Framework (RDF). Retrieved from
http://www.lesliesikos.com/tutorials/rdf/
Tauberer, J. (n.d.) Quick intro to RDF. Retrieved from http://www.rdfabout.com/quickintro.xpd
CONSIDERING RDF
14
Appendix
A vehicle class schema represented in graph form (from Manola & Miller, 2004)
CONSIDERING RDF
Below is the RDF/XML serialization for the same vehicle schema.
<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xml:base="http://example.org/schemas/vehicles">
<rdf:Description rdf:ID="MotorVehicle">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
</rdf:Description>
<rdf:Description rdf:ID="PassengerVehicle">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<rdfs:subClassOf rdf:resource="#MotorVehicle"/>
</rdf:Description>
<rdf:Description rdf:ID="Truck">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<rdfs:subClassOf rdf:resource="#MotorVehicle"/>
</rdf:Description>
<rdf:Description rdf:ID="Van">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<rdfs:subClassOf rdf:resource="#MotorVehicle"/>
</rdf:Description>
<rdf:Description rdf:ID="MiniVan">
<rdf:type rdf:resource="http://www.w3.org/2000/01/rdf-schema#Class"/>
<rdfs:subClassOf rdf:resource="#Van"/>
<rdfs:subClassOf rdf:resource="#PassengerVehicle"/>
</rdf:Description>
</rdf:RDF>
15
Download