Archivists’ Toolkit Preliminaries: Architecture, DB Leslie Myrick NYU Possible Java Architecture • JSP Model 2 Architecture – Servlet Controller • Handles requests, View selection, instantiates beans – JSPs update the View in the browser – JavaBeans used to represent the object in memory; access DB using JDBC • manage the Model – JDBC connection to the data source Similar Use of Servlet/JSP Model in Digital Library Applications • Dspace • UC Berkeley’s GenX system • CDL Preservation Repository JSP Model 2 • Cleanest separation of presentation and content – Clear delineation of roles of developers and designers • Takes advantage of strengths of servlets and JSPs for serving dynamic content – JSP for presentation layer – Servlets for performing process-intensive tasks • Servlet as Controller in charge of request processing, creation of beans or objects used by JSPs to forward request • No processing logic in JSPs -- simply responsible for retrieving objects or beans instantiated by servlets JSP Model 2 Architecture JSP Model 1 • Bulk of processing performed by JSP – Process requests and draw view • Fine for simple applications JSP Model 1 Architecture MySQL vs postgreSQL • Both ACID compliant (transaction safe) • Both support referential integrity (as of MySQL 4.x) • MySQL faster; postgreSQL more robust • Finer grained locking in postgreSQL – MultiVersion Currency Control in postgreSQL • Want triggers? Views? Inheritance? For now go with postgreSQL • MySQL has built-in full-text search capability • Ease of installation and maintenance – MySQL hands down. The ACID test • Atomicity - All elements of a given transaction take place or none do. • Consistency - Each transaction transforms the database from one valid state to another valid state. • Isolation - The effects of a transaction are not visible to other transactions in the system until it is complete. • Durability - Once a transaction has been committed, it's effects are permanent-- even if the system crashes, or a disk dies. Proposed DB Schema: Archaeology / Genealogy • Ultimately based on MOA II model • With refinements to NYU’s zeroDB schema for digital object metadata • Torqued to describe archival objects and their digital surrogates • Same essential hook: pure Aristotelian hierarchy It all comes down to object • Pivotal entity is object nesting other objects – objectType can be fonds, collection, component – componentType can be series, file, item, accretion • Object hierarchy maintained through: – objectID, parentID, nextSibID Object Table object PK FK1 FK4 FK5 FK2 FK3 objectID objectTypeID componentTypeID parentID nextSibID hasChildren rightsID accessionID provenanceID physDescID processFinal physLocID Accession Table accession PK accessionID accessionTypeID resourceID recordCollectionTypeID collectionSurvey processingPlan processingNote acqinfo accruals appraisal abstract generalNote scopecontent arrangement accessrestrict preservationNote conservationNote otherfindaid transferFinal Provenance Table provenance PK provenanceID bioghist bibliography custodhist fileplan donorNote provenanceNote Physical Location Tables physLoc physLocID FK1 FK2 physLocLevelID physLocTypeID physLoc isPublic objectID physLocType physLocLevel PK PK physLocLevelID physLocLevel PK physLocTypeID physLocType CREATE TABLE physLoc ( physLocID int(11) NOT NULL auto_increment, physLocLevelID int(11) not NULL default '0', physLocTypeID int(11) NOT NULL default '0', physLoc varchar(128) NOT NULL default '', isPublic tinyint(1) unsigned NOT NULL default '0', PRIMARY KEY (physLocID) -- ); -- Data for table physLocLevel -INSERT INTO physLocLevel (physLocLevel) VALUES ('repository'); INSERT INTO physLocLevel (physLocLevel) VALUES ('internal location'); INSERT INTO physLocLevel (physLocLevel) VALUES ('physical container'); --- Data for table 'physLocType' -INSERT INTO physLocType (physLocType) VALUES ('accession location'); INSERT INTO physLocType (physLocType) VALUES ('processing location'); INSERT INTO physLocType (physLocType) VALUES ('shelflist location'); INSERT INTO physLocType (physLocType) VALUES ('offsite location'); Ingest of Legacy Data from marcxml • Student Programmers’ Assignment • Probably involve JAXP/DOM • Already undertaken conversion of records from Innopac iiirecord dtd to marc21slim schema; tape .mrc to marcxml using marc4J Ingest of Legacy Data from EAD • Testbed creation tool • XSLT with Java Extensions using Xalan – Get nextID from database – Extensions instantiate and increment DBID, parentID, nextSibID for each component in <dsc> – Write out to .sql file to dump into DB <xalan:component prefix="counter" elements="init incr" functions="read"> <xalan:script lang="javaclass" src="xalan://MyCounter"/> </xalan:component> <xsl:template match="/"> <counter:init name="index"/> <xsl:template name="dsc"> <xsl:for-each select="ead/archdesc/dsc"> <xsl:variable name="dsc-parentID"><xsl:value-of select="counter:read('index')"/></xsl:variable> <counter:incr name="index"/> <xsl:for-each select="c01"> DBID: <xsl:value-of select="counter:read('index')"/> PARENTID <xsl:value-of select="$dsc-parentID"/> Series: c01-<xsl:number/> Unittitle: <xsl:apply-templates select="did/unittitle"/> Abstract: <xsl:apply-templates select="did/abstract"/> <xsl:if test="./child::scopecontent"> Scopecontent:<xsl:for-each select="scopecontent/p"><xsl:apply-templates select="."/></xsl:for-each> </xsl:if> DBID: 3 PARENTID 2 Series: c01-1 Unittitle: Series I: Documentary Material DBID: 4 PARENTID:3 Subseries: c02-1 Unittitle: Subseries A: Subjects DBID:5 PARENTID: 4 Subseries: c03-1 Box: 1 Folder: 1 Unittitle: Advertising Unitdate:undated DBID:6 PARENTID: 4 Subseries: c03-2 Box: 1 Folder: 2-6 Unittitle: Art & Collecting Unitdate: undated DBID: 3 PARENTID: 2 NEXTSIBID: 126 Series: c01-1 Unittitle: Series I: Documentary Material INSERT INTO OBJECT (objectID, parentID, nextSibID, hasChildren, componentTypeID) VALUES (3,2,126,1,1); INSERT INTO TITLE (titleID, titleTypeID, title, objectID) VALUES (NULL,1,"Series I: Documentary Material",3)