http://courses.ischool.utexas.edu/galloway/2009/spring/INF392K/preprocessingstepsv22009.doc

advertisement
Preprocessing steps before ingest into DSpace
Sarah Kim, 01/09/2009; revised Patricia Galloway 02/10/2009
1. Collection accession
- Gather information about the collection, expected results of the project, situation of
property rights, access limitation issues, privacy issues, and so forth, through interviews
with record creator or archivist from owning archive
- Where possible, sample any paper materials and learn more about the creator and the rest
of the collection
- Take the collection into secured custody if possible; otherwise check out prepared laptop
for use onsite
2. Create an inventory of physical media
3. Collection assessment (sample items for the assessment if there are too many)
A. Current storage (e.g. storage media, server)
- Observe the current storage condition (e.g. physical damage on storage media)
- Record any labels on media; document with digital photo
- Figure out equipment required (e.g. drives, software to recover damaged items)
- Assess accessibility and readability of the current storage media
B. Bitstreams/Files
- Attempt to determine native operating system if possible
o Use clues from age and types of media, existing collection documentation
- Check and record virus presence and any exact copies of bitstreams
- Create a file catalog
- Record the quantity of the collection, types of files, applications used to render bitstreams
- Observe if there is any original file naming convention, file organization/classification
methods, and so forth that was assigned by the record creator so should be preserved as it
is.
- Assess accessibility and readability of files
- Transfer bitstreams/files from the original storage to a neutral storage device for
assessing and processing
o Do not copy virus-infected files
o Make media unwritable if possible
o Attempt to read media directory
o Calculate message digest for each file
o Create mirror directory on neutral storage device
o Copy files to neutral storage
o Calculate message digests and verify that they are the same
o Retain these message digests as you will need them for ingest
C. Overall
- Assess potential risks of files and media regarding physical and technical conditions
- Assess special issues that need to be taken care of
- Document assessment results (e.g. generate condition report)
4. Set up a processing plan based on the assessment results
1
-
-
Set up processing priority and the scope of the project if there are too many items
Decide the level of service required within the repository
Decide the significant properties to be preserved
Decide types of technical processing (e.g. create access copies or backup copies in other
file format(s) if necessary, recover damaged or virus-infected items if possible, etc.)
Determine collection arrangement or structure in IR
o Original order as used by the creator (if known)
o Original order as received by owning archive
o Standard ordering used by owning archive
Decide metadata elements and methods of extracting/creating metadata
Resolve any special issues (e.g. issues of access restriction, duplicate files, any original
order)
5. Create SIP agreement based on processing plan
6. Off-site processing
- Do technical processing
- Prepare biographical sketch, collection scope, and content notes
- Prepare descriptions for sub-communities if necessary
7. Prepare SIP for batch ingest or carry out manual ingest for small collections
- Set up collections and communities in DSpace as planned
- Extract and prepare metadata for each item
- Prepare required files (contents, dublincore) and commend lines for batch ingest
2
Download