http://courses.ischool.utexas.edu/galloway/2010/spring/INF392K/preprocessingstepsv32010.doc

advertisement
Preprocessing steps before ingest into DSpace
1. Collection acquaintance
- Gather information about the collection, expected results of the project, situation of
property rights, access limitation issues, privacy issues, and so forth, through interviews
with record creator or archivist from owning archive (custodian)
- Create an inventory of physical media
- Where possible, sample any paper materials; learn more about the creator and how or
whether the digital materials are related to any other digital or paper records
- Take the collection into secured custody if possible or necessary (Quinn has a locked
cabinet in the IT lab that you may be able to use, so check with instructor); otherwise
check out a prepared laptop for use onsite. It is most important that the collection not be
damaged or changed in any way
2. Collection assessment (sample items for the assessment if there are too many)
A. Current storage of digital materials (e.g. storage media, server)
- Observe the current storage conditions (e.g. physical damage on storage media)
- Record any labels on media; document with digital photo
- Figure out what equipment and software will be required to read and copy from media
(e.g. drives, software to recover damaged items)
- Assess accessibility and readability of the current storage media
B. Bitstreams/Files
- Attempt to determine native operating system if possible
o Use clues from age and types of media, existing collection documentation
- Check and record virus presence and any exact copies of bitstreams
- Create a file catalog
- Record the quantity of the collection, types of files, applications used to render bitstreams
- Observe if there is any original file naming convention, file organization/classification
methods, and so forth that was assigned by the record creator so should be preserved as it
is.
- Assess accessibility and readability of files
- Transfer bitstreams/files from the original storage to a neutral storage device for
assessing and processing
o Do not copy virus-infected files
o Make media unwritable if possible
o Attempt to read media directory
o Calculate message digest for each file
o Create mirror directory on neutral storage device
o Copy files to neutral storage
o Calculate message digests and verify that they are the same
o Retain these message digests as you will need them for ingest
C. Overall
- Assess potential risks of files and media regarding physical and technical conditions
- Assess special issues that need to be taken care of
- Document assessment results (e.g. generate condition report)
1
3. Set up a processing plan based on the assessment results and discussions with the creator or
collection custodian
- Set up processing priority and the revise the scope of the project if there are too many
items
- Negotiate the level of service required within the repository
- Determine the significant properties to be maintained as renderable
- Based on these decisions, design a technical processing plan (e.g. create access copies or
backup copies in other file format(s) if necessary, recover damaged or virus-infected
items if possible, etc.)
- Determine collection arrangement or structure in the IR
o Original order as used by the creator (if known)
o Original order as received by owning archive
o Standard ordering used by owning archive
o Any mapping scheme to relate one with the other
- Decide on appropriate metadata elements and methods of extracting/creating metadata for
each individual bitstream
- Resolve any special issues (e.g. issues of access restriction, duplicate files, any original
order) and create an access plan
4. Create and execute SIP agreement based on the processing plan
5. Off-site processing
- Do technical processing acording to the processing plan
- Prepare biographical sketch or institutional history, community, subcommunity, and
collection scope and content notes
6. Prepare SIP for batch ingest or carry out manual ingest for small collections
- Set up collections and communities in DSpace as planned, so they will be ready to
receive the materials
- Enter aggregate descriptive metadata
- Extract and prepare metadata for each item
- Prepare required files (contents, metadata) and command lines for batch ingest
7. Provide aggregated files for uploading or assist with uploading
8. Review structure and metadata and test access to be sure that all perform as desired
9. Prepare a simple user guide for the benefit of the collection creator and/or custodian
Original draft Sarah Kim, 01/09/2009; revised Patricia Galloway 02/10/2009 and 2/4/2010
2
Download