Finding a New Way Using Automated Business Rules to Process Electronic Records Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library, Archives and Public Records PeDALS Persistent Digital Archives and Library System ► Arizona State Library, Archives and Public Records ► Florida State Library and Archives ► South Carolina Department of Archives & History ► South Carolina State Library ► New York State Library ► New York State Archives ► Wisconsin State Archives ► Two more partners ► Kudos to Washington State Archives Curatorial Rationale ► Question traditional, paper-based practices in order to transform them into the digital era Appraisal Acquisition Arrangement and description Housing and storage Reference and access Preservation ► Preserving archival principles of provenance, context, collective control, and authenticity and integrity Technical Goals ►To demonstrate the use of middleware to implement business rules in software as an integrated workflow to process collections of records and publications ►To build “digital stacks” using LOCKSS as the basis of an inexpensive storage network that can preserve the authenticity and integrity of the materials. Additional Goals ►To build a community of shared practice that meets the needs of a wide range of repositories For best practices ~ appropriate practices, what works, what’s practical For resource sharing ~ avoid redundant work ►To remove barriers to preservation by keeping costs as low as possible Preliminary Results: A New Research Agenda ►Vulcan mind meld Immediate understanding, no confusion ►Cloning No time wasted in job search, known results ►Time travel All the time you need while meeting deadlines PeDALS at 50,000 feet ►Based on OAIS Reference Model ►Metadata Transforms and normalizes received metadata Enhances received metadata ►Archival Information Package Creates and stores in LOCKSS ►Dissemination Information Package Creates and publishes to the web Appropriate Record Sets ► Ideal scenario Created, stored in a recordkeeping system Indexed ► Likely to succeed Certificates Email Indexed documents ► Less likely to succeed Hard drives with no index ► Sufficient number and consistency to allow rules Curatorial Rationale ►Focus on why, not just how ►Strategic shift in how we work Not limited to doing things differently Doing different things ►Curators work with rules, not records Describe business processes (rules) Monitor the process for quality assurance Metadata and Queries ►Single schema Administration, discovery, preservation ►Elements common to all government records Definition and cross-walks Rationale ►What is it ►Who uses it ►For what purpose Example: Item Title ►Definition: The word or phrase, taken from a prescribed source by which a work is known ►Rationale: Serves as a "handle" to represent the object at an abstract level in lists, such as search results. A supplied title should contain sufficient information to aid patrons in the selection of materials. Because date is preferred and included in search results by default, the title need not include date information. Integrated Work ow Creation ►Prepare Submission Information Package Extract records for transmission Extract metadata Create shipping manifest ►Negotiation File formats for records, metadata, shipping manifest Transfer methodology Frequency of transfer Description ►Traditional archival description Provenance Series Acquisition ►Rules-based description Metadata mapping AIP schema DIP schema Submission ►Transfer sFTP, disk, tape, sneakernet Deposit on Point of Ingest server ►Data wrangling Virus scan Normalize process during initial transfers Run manual processes to prep data Create AIP ►Simple schema for single files Normalized metadata Received metadata Record (typically Base64 encoding) ►Compound schema for multiple files Normalized metadata Received metadata Structural metadata Files (typically Base64 encoding) Ingest ►Update administrative catalog ►Encapsulate AIPs in Superpackage ►Expose to LOCKSS Automatic integrity checking Automatic error correction Distributed preservation model Sustainable business model Inexpensive Testing a 16TB system Dissemination ►Create DIP Browser friendly format ►Update public catalog ►Publish to website Simplification ►Community of shared practice Many hands make light work Resource sharing Support network ►Generic, modular processes Code reuse ►Standard schema Catalog databases Packages Automated Processing ►Open source v. proprietary software ►Middleware Microsoft BizTalk ►Metadata tools New Zealand Metadata Extractor Bag It file transfer validation ►Agile-Scrum project management methodology Project Status – Completed ►Technical infrastructure installed ►Core metadata defined ►Schema for a simple AIP ►Developed administrative catalog ►AZ marriage certificates ingested, transformed and created metadata, packaged as AIPs, and deposited in LOCKSS ►Demonstrated reuse of code by adapting rules for marriage certificates to SC Public Service Commission orders Project Status – To Do ►Complete Administrative Catalog Interface ►Develop AIP for compound records ►Develop DIP ►Develop Public Catalog web interface ►Write rules to ingest additional records and publications ►Project to be completed by December 2010 State Initiatives Symposium ►Results from NDIIPP State Initiatives Projects Arizona Minnesota North Carolina Washington ►Possibly with Best Practices Exchange Phoenix Fall 2010 For more information ►http://www.pedalspreservation.org/ ►Principal Investigator Richard Pearce-Moses rpm@lib.az.us ►Project Coordinator Sara Muth smuth@lib.az.us