RDF Documentation for integration of the Livingstone Spectral Imaging Project into NINES Adrian S. Wisnicki, March 2013 1.0. Overview The RDF files used to integrate the Livingstone Spectral Imaging Project (http://livingstone.library.ucla.edu/ [accessed 12 Mar. 2013]) into NINES (http://www.nines.org/ [accessed 12 Mar. 2013]) were created over two months in early 2013 after a few months of prior planning and discussion between Adrian S. Wisnicki (Livingstone Spectral Imaging Project Director), Lisa McAulay (Livingstone Spectral Imaging Project Developer), and Dana Wheeles (NINES Project Manager). We have created one RDF file for each Livingstone Spectral Imaging Project HTML file that contains significant textual content. In those cases where the characterization of "significant textual content" could not be made with full confidence, we have erred on the side of creating the RDF file. In creating the RDF elements, we have followed the guidelines set out on the NINES "Submitting RDF" Wiki page (http://wiki.collex.org/index.php/Submitting_RDF [accessed 12 Mar. 2013]), although Dana Wheeles has advised on usage in a few cases where specifications had been updated since the creation of the Wiki page. As a general rule we provide the following 16 elements for each RDF file. Some of these elements may appear more than once in any given RDF file as per usual NINES practice as set out on the Wiki page: <rdf:RDF> <livingstone:lsip> <collex:archive> <role:AUT> <role:EDT> <role:PBL> <dc:type> <collex:discipline> <collex:genre> <dc:date> <collex:date> <rdfs:label> <rdfs:value> <collex:text> <rdfs:seeAlso> <collex:federation> 2.0. File Naming RDF file names always take the following format: bambarre-home.rdf 1871-project_planning.rdf The first segment (that which precedes the hyphen) references the specific edition or archive to which the RDF file is linked. The possibilities for this segment are limited to: 1871 (for the critical edition of Livingstone's 1871 Field Diary) archive (for the Livingstone Spectral Image Archive) bambarre (for the critical edition of Livingstone's Letter from Bambarre) project (for the landing pages of the Livingstone Spectral Imaging Project as a whole) The second segment (the remainder of text that precedes the .rdf) corresponds to the HTML file name in most cases. The only exception to this rule applies to HTML files called "index". In those cases, we use "home" as in the first example above. Finally, the suffix indicates the file type, which is ".rdf" without exception. 3.0 Notes on use of RDF elements 3.1. <rdf:RDF> As found in our RDF files, the <rdf:RDF> element always contains the <livingstone:lsip> element, which, in turn, contains all the other RDF elements used by our project. The attributes provided for the <rdf:RDF> element are the same across all RDF files created by our project. 3.2. <livingstone:lsip> The <livingstone:lsip> element references our custom namespace as provided in the <rdf:RDF> element (xmlns:livingstone="http://livingstone.library.ucla.edu/test"). The <livingstone:lsip> element always takes an @rdf:about. The @rdf:about value is a URI specific to the HTML page to which the RDF file corresponds. We have chosen to create distinct URIs for each HTML page because these URIs can remain stable even if the HTML page URLs change in the future. As the NINES Wiki notes: “These [URIs] are the most brittle aspect of the NINES system. If you change an id, all the user-created content built on top of your object will be lost or ruined. This includes tags and annotations as well as NINES exhibits, such as course syllabi or critical essays” (section on “The Importance of Being Stable”). The @rdf:about value always takes the following form: http://livingstone.library.ucla.edu/1871diary/source-texts/transcriptions.htm The first segment of this URI (http://livingstone.library.ucla.edu/) derives from the Livingstone Spectral Imaging Project home page URL. The second segment (1871diary) refers to the specific edition or archive within the Livingstone Spectral Imaging Project that contains the given HTML page. As of present writing, the values for this second segment can only be: 1871diary (for the critical edition of Livingstone's 1871 Field Diary) bambarre (for the critical edition of Livingstone's Letter from Bambarre) livingstone_archive (for the Livingstone Spectral Image Archive) In the case of RDF files that reference the Livingstone Spectral Imaging Project landing pages, we have omitted this second segment. The third segment refers to the type of content in the HTML file. The third segment can only be: criticism, project-history, source-texts, or collection. The fourth (and final) segment refers to the actual file name of the HTML page. The URI always ends with the ".htm" suffix. In other words, in creating the URI for the <livingstone:lsip> element, we have taken the current URL of each HTML page and added in the third segment described above. If this third segment is removed from the URI, what remains is the URL for the given web page. Finally, it is important to note that in those cases where a given HTML page represents only the first in a series of pages that collectively make up a discrete article, the URI corresponds only the first page, but stands in for all the pages together. As a result, we have not created separate RDF files for other pages in the series. 3.3. <collex:archive> The <collex:archive> element always takes the value of "livingstone." This value is unique to our project among the other projects contained by NINES. 3.4. <dc:type>, <collex:discipline>, and <collex:genre> Our practice in using the <dc:type>, <collex:discipline>, and <collex:genre> elements is to be inclusive rather than exclusive. As a result, most of our RDF files contain multiple instances of these two elements as we have made it a practice to include all values (from the set of values provided by NINES) that apply to our project. 3.5. <collex:date>, <rdfs:label>, and <rdfs:value> As found in our RDF files, the <dc:date> element always contains the <collex:date> element, which in turn always contains the <rdfs:label>, and <rdfs:value> elements. 3.6. <collex:text> The <collex:text> element always contains the full text of the given HTML page except for the header and sidebar, both of which are common to all pages within a given edition or archive. This text has been added to the RDF file by means of cut and paste, first from the HTML page (including those pages where text is rendered from the XML) into a plain text file (in order to remove any formatting anomalies), then from the plain text file into the RDF file. Where needed, characters such as "&" have been converted to "&amp;" in order to be RDF compliant. Finally, when a given RDF file corresponds to a discrete article that spans a series of HTML pages, we include the full text for all these pages within this <collex:text> element. 3.7. <rdfs:seeAlso> The <rdfs:seeAlso> element references the URL of the given HTML page and always takes an @rdf:resource, the value of which is the actual URL. In those cases where a given HTML page represents only the first in a series of pages that collectively make up a discrete article, the URL given here corresponds only the first page, and we do not provide the URLs for the other pages in the given series. 3.8. <collex:federation> The <collex:federation> element always takes the value of "NINES."