All the time you need while meeting deadlines

advertisement
Finding a New
Way
Using Automated Business Rules to
Process Electronic Records
Richard Pearce-Moses
Deputy Director for
Technology & Information Resources
Arizona State Library, Archives and Public Records
PeDALS
Persistent Digital Archives and Library System
► Arizona State Library, Archives and Public Records
► Florida State Library and Archives
► South Carolina Department of Archives & History
► South Carolina State Library
► New York State Library
► New York State Archives
► Wisconsin State Archives
► Two more partners
► Kudos to Washington State Archives
Curatorial Rationale
► Question traditional, paper-based practices in order to
transform them into the digital era






Appraisal
Acquisition
Arrangement and description
Housing and storage
Reference and access
Preservation
► Preserving archival principles of provenance, context,
collective control, and authenticity and integrity
Technical Goals
►To demonstrate the use of middleware to
implement business rules in software as an
integrated workflow to process collections of
records and publications
►To build “digital stacks” using LOCKSS as the basis
of an inexpensive storage network that can
preserve the authenticity and integrity of the
materials.
Additional Goals
►To build a community of shared practice
that meets the needs of a wide range of
repositories
 For best practices ~ appropriate practices,
what works, what’s practical
 For resource sharing ~ avoid redundant work
►To remove barriers to preservation by
keeping costs as low as possible
Preliminary Results:
A New Research Agenda
►Vulcan mind meld
Immediate understanding, no confusion
►Cloning
No time wasted in job search, known results
►Time travel
All the time you need while meeting deadlines
PeDALS at 50,000 feet
►Based on OAIS Reference Model
►Metadata
 Transforms and normalizes received metadata
 Enhances received metadata
►Archival Information Package
 Creates and stores in LOCKSS
►Dissemination Information Package
 Creates and publishes to the web
Appropriate Record Sets
► Ideal scenario
 Created, stored in a recordkeeping system
 Indexed
► Likely to succeed
 Certificates
 Email
 Indexed documents
► Less likely to succeed
 Hard drives with no index
► Sufficient number and consistency to allow rules
Curatorial Rationale
►Focus on why, not just how
►Strategic shift in how we work
 Not limited to doing things differently
 Doing different things
►Curators work with rules, not records
 Describe business processes (rules)
 Monitor the process for quality assurance
Metadata and Queries
►Single schema
 Administration, discovery, preservation
►Elements common to all government records
 Definition and cross-walks
 Rationale
►What is it
►Who uses it
►For what purpose
Example: Item Title
►Definition: The word or phrase, taken from a
prescribed source by which a work is known
►Rationale: Serves as a "handle" to represent the
object at an abstract level in lists, such as search
results. A supplied title should contain sufficient
information to aid patrons in the selection of
materials. Because date is preferred and included
in search results by default, the title need not
include date information.
Integrated Work ow
Creation
►Prepare Submission Information Package
 Extract records for transmission
 Extract metadata
 Create shipping manifest
►Negotiation
 File formats for records, metadata, shipping manifest
 Transfer methodology
 Frequency of transfer
Description
►Traditional archival description
 Provenance
 Series
 Acquisition
►Rules-based description
 Metadata mapping
 AIP schema
 DIP schema
Submission
►Transfer
 sFTP, disk, tape, sneakernet
 Deposit on Point of Ingest server
►Data wrangling
 Virus scan
 Normalize process during initial transfers
 Run manual processes to prep data
Create AIP
►Simple schema for single files
 Normalized metadata
 Received metadata
 Record (typically Base64 encoding)
►Compound schema for multiple files




Normalized metadata
Received metadata
Structural metadata
Files (typically Base64 encoding)
Ingest
►Update administrative catalog
►Encapsulate AIPs in Superpackage
►Expose to LOCKSS






Automatic integrity checking
Automatic error correction
Distributed preservation model
Sustainable business model
Inexpensive
Testing a 16TB system
Dissemination
►Create DIP
 Browser friendly format
►Update public catalog
►Publish to website
Simplification
►Community of shared practice
 Many hands make light work
 Resource sharing
 Support network
►Generic, modular processes
 Code reuse
►Standard schema
 Catalog databases
 Packages
Automated Processing
►Open source v. proprietary software
►Middleware
 Microsoft BizTalk
►Metadata tools
 New Zealand Metadata Extractor
 Bag It file transfer validation
►Agile-Scrum project management methodology
Project Status – Completed
►Technical infrastructure installed
►Core metadata defined
►Schema for a simple AIP
►Developed administrative catalog
►AZ marriage certificates ingested, transformed and
created metadata, packaged as AIPs, and
deposited in LOCKSS
►Demonstrated reuse of code by adapting rules for
marriage certificates to SC Public Service
Commission orders
Project Status – To Do
►Complete Administrative Catalog Interface
►Develop AIP for compound records
►Develop DIP
►Develop Public Catalog web interface
►Write rules to ingest additional records and
publications
►Project to be completed by December 2010
State Initiatives Symposium
►Results from NDIIPP State Initiatives Projects




Arizona
Minnesota
North Carolina
Washington
►Possibly with Best Practices Exchange
 Phoenix
 Fall 2010
For more information
►http://www.pedalspreservation.org/
►Principal Investigator
 Richard Pearce-Moses
rpm@lib.az.us
►Project Coordinator
 Sara Muth
smuth@lib.az.us
Download