Overwhelmed by Large-Scale Digitization Projects?

advertisement
OVERWHELMED BY LARGE-SCALE
DIGITIZATION PROJECTS
Xiaocan (Lucy) Wang
Digital Repository Librarian
Eric Holt
University Archivist
Cunningham Memorial Library
Indiana State University
Agenda

Project background

Implementation
Equipment
 Software choices
 Process
 Ingestion
 Workflow




Outcome
Lesson learned
Conclusion
Project Background

Indiana State University
Project Background

ETD (electronic theses and dissertations)
 ETD Digital Initiative
 2010 and onward
 Access
Project background (cont.)

RTD (retrospective theses and dissertations)
 Number:
3,802
 Where: Archives + Library basement
 Condition: most in usable condition, but…
 Access
Project Background (cont.)

Purposes
 Centralize:
ETD & RTD
 Improve access, search and retrieval
 Support teaching, learning and research
 Improve preservation
Project Background (cont.)

Consideration
 Format
 Copyright
 Privacy
Equipment

Bookdrive DIY
Disclosure



Not currently or previously an employee of the
corporations whose products I discuss
I am not compensated for my comments or opinions
Older software version being used
Capture New Book window
Capture in action
Batch entry
Irfanview
GIMP



Open source equivalent to Photoshop
Batch processing requires additional plugin
Supervisor unfamiliarity
Photoshop


Can record action to perform batch processing
Graphical interface while setting up recorded
action
Changing DPI
Color
Grayscale
B/W
PDF Compression



All items being converted are compressed
Some formats compress better than others
Compression artifacts can also become visible
Original image of page is visible
Searchable text layer is hidden
First Review


All pages present?
All text legible?
 No
shadows covering text?
 Page in focus?

Essential color elements retained?
PDF/a


Copy saved to Archives server
Only accessible to staff
Final Review and cleanup




Review metadata
Correct if necessary
Approve and publish
Remove original camera images, processed images,
and extra copies of pdf
Workflow

Imaging original theses or dissertations
Workflow (cont.)

Processing image files
Workflow (cont.)

Converting to PDF/A
Workflow (cont.)

Publishing on ISU IR
Outcomes







Volume finished: 848
Average volume size: 96 pages
Average student time: 1.3 hours
Average supervisor time: 5-10 minutes
Average file size: 5.5 MB
Total Disk Space: 4.6 GB
Approximate cost: $15-18
Worth It?


Centralize
Improve access
 Via
digital repository
 Search engines
 Digital repository registries
 WorldCat
Worth it? (cont.)

Support teaching, learning and research

Improve preservation strategies
 Multiple
digital copies
 Backup
 Bitstream
preservation
 Distributed preservation network
 via
MetaArchive Cooperative
Lesson learned

Control quality:




monochrome and grayscale
Supervise students
Add MARC 856 field
Secure continued funds
Conclusion


Complex
Various issues
 Funding
 Technical
standards
 Quality control
 Format selection

In-house vs. outsourcing
Metadata
Delivery
Preservation
Rights management
Workflow development
Contact info
Xiaocan (Lucy) Wang
Xiaocan.wang@indstate.edu
Eric Holt
Eric.Holt@indstate.edu
Download