RDF Documentation for integration of the Livingstone Spectral

advertisement
RDF Documentation for integration of the Livingstone Spectral Imaging Project into NINES
Adrian S. Wisnicki, March 2013
1.0. Overview
The RDF files used to integrate the Livingstone Spectral Imaging Project
(http://livingstone.library.ucla.edu/ [accessed 12 Mar. 2013]) into NINES (http://www.nines.org/
[accessed 12 Mar. 2013]) were created over two months in early 2013 after a few months of prior
planning and discussion between Adrian S. Wisnicki (Livingstone Spectral Imaging Project
Director), Lisa McAulay (Livingstone Spectral Imaging Project Developer), and Dana Wheeles
(NINES Project Manager).
We have created one RDF file for each Livingstone Spectral Imaging Project HTML file that
contains significant textual content. In those cases where the characterization of "significant textual
content" could not be made with full confidence, we have erred on the side of creating the RDF file.
In creating the RDF elements, we have followed the guidelines set out on the NINES "Submitting
RDF" Wiki page (http://wiki.collex.org/index.php/Submitting_RDF [accessed 12 Mar. 2013]),
although Dana Wheeles has advised on usage in a few cases where specifications had been
updated since the creation of the Wiki page.
As a general rule we provide the following 16 elements for each RDF file. Some of these elements
may appear more than once in any given RDF file as per usual NINES practice as set out on the
Wiki page:
<rdf:RDF>
<livingstone:lsip>
<collex:archive>
<role:AUT>
<role:EDT>
<role:PBL>
<dc:type>
<collex:discipline>
<collex:genre>
<dc:date>
<collex:date>
<rdfs:label>
<rdfs:value>
<collex:text>
<rdfs:seeAlso>
<collex:federation>
2.0. File Naming
RDF file names always take the following format:
bambarre-home.rdf
1871-project_planning.rdf
The first segment (that which precedes the hyphen) references the specific edition or archive to
which the RDF file is linked. The possibilities for this segment are limited to:
1871 (for the critical edition of Livingstone's 1871 Field Diary)
archive (for the Livingstone Spectral Image Archive)
bambarre (for the critical edition of Livingstone's Letter from Bambarre)
project (for the landing pages of the Livingstone Spectral Imaging Project as a whole)
The second segment (the remainder of text that precedes the .rdf) corresponds to the HTML file
name in most cases. The only exception to this rule applies to HTML files called "index". In those
cases, we use "home" as in the first example above.
Finally, the suffix indicates the file type, which is ".rdf" without exception.
3.0 Notes on use of RDF elements
3.1. <rdf:RDF>
As found in our RDF files, the <rdf:RDF> element always contains the <livingstone:lsip> element,
which, in turn, contains all the other RDF elements used by our project. The attributes provided for
the <rdf:RDF> element are the same across all RDF files created by our project.
3.2. <livingstone:lsip>
The <livingstone:lsip> element references our custom namespace as provided in the <rdf:RDF>
element (xmlns:livingstone="http://livingstone.library.ucla.edu/test"). The <livingstone:lsip> element
always takes an @rdf:about. The @rdf:about value is a URI specific to the HTML page to which
the RDF file corresponds. We have chosen to create distinct URIs for each HTML page because
these URIs can remain stable even if the HTML page URLs change in the future. As the NINES
Wiki notes: “These [URIs] are the most brittle aspect of the NINES system. If you change an id, all
the user-created content built on top of your object will be lost or ruined. This includes tags and
annotations as well as NINES exhibits, such as course syllabi or critical essays” (section on “The
Importance of Being Stable”).
The @rdf:about value always takes the following form:
http://livingstone.library.ucla.edu/1871diary/source-texts/transcriptions.htm
The first segment of this URI (http://livingstone.library.ucla.edu/) derives from the Livingstone
Spectral Imaging Project home page URL.
The second segment (1871diary) refers to the specific edition or archive within the Livingstone
Spectral Imaging Project that contains the given HTML page. As of present writing, the values for
this second segment can only be:
1871diary (for the critical edition of Livingstone's 1871 Field Diary)
bambarre (for the critical edition of Livingstone's Letter from Bambarre)
livingstone_archive (for the Livingstone Spectral Image Archive)
In the case of RDF files that reference the Livingstone Spectral Imaging Project landing pages, we
have omitted this second segment.
The third segment refers to the type of content in the HTML file. The third segment can only be:
criticism, project-history, source-texts, or collection.
The fourth (and final) segment refers to the actual file name of the HTML page. The URI always
ends with the ".htm" suffix.
In other words, in creating the URI for the <livingstone:lsip> element, we have taken the current
URL of each HTML page and added in the third segment described above. If this third segment is
removed from the URI, what remains is the URL for the given web page.
Finally, it is important to note that in those cases where a given HTML page represents only the
first in a series of pages that collectively make up a discrete article, the URI corresponds only the
first page, but stands in for all the pages together. As a result, we have not created separate RDF
files for other pages in the series.
3.3. <collex:archive>
The <collex:archive> element always takes the value of "livingstone." This value is unique to our
project among the other projects contained by NINES.
3.4. <dc:type>, <collex:discipline>, and <collex:genre>
Our practice in using the <dc:type>, <collex:discipline>, and <collex:genre> elements is to be
inclusive rather than exclusive. As a result, most of our RDF files contain multiple instances of
these two elements as we have made it a practice to include all values (from the set of values
provided by NINES) that apply to our project.
3.5. <collex:date>, <rdfs:label>, and <rdfs:value>
As found in our RDF files, the <dc:date> element always contains the <collex:date> element, which
in turn always contains the <rdfs:label>, and <rdfs:value> elements.
3.6. <collex:text>
The <collex:text> element always contains the full text of the given HTML page except for the
header and sidebar, both of which are common to all pages within a given edition or archive. This
text has been added to the RDF file by means of cut and paste, first from the HTML page
(including those pages where text is rendered from the XML) into a plain text file (in order to
remove any formatting anomalies), then from the plain text file into the RDF file. Where needed,
characters such as "&" have been converted to "&" in order to be RDF compliant. Finally,
when a given RDF file corresponds to a discrete article that spans a series of HTML pages, we
include the full text for all these pages within this <collex:text> element.
3.7. <rdfs:seeAlso>
The <rdfs:seeAlso> element references the URL of the given HTML page and always takes an
@rdf:resource, the value of which is the actual URL. In those cases where a given HTML page
represents only the first in a series of pages that collectively make up a discrete article, the URL
given here corresponds only the first page, and we do not provide the URLs for the other pages in
the given series.
3.8. <collex:federation>
The <collex:federation> element always takes the value of "NINES."
Download