OVERWHELMED BY LARGE-SCALE DIGITIZATION PROJECTS Xiaocan (Lucy) Wang Digital Repository Librarian Eric Holt University Archivist Cunningham Memorial Library Indiana State University Agenda Project background Implementation Equipment Software choices Process Ingestion Workflow Outcome Lesson learned Conclusion Project Background Indiana State University Project Background ETD (electronic theses and dissertations) ETD Digital Initiative 2010 and onward Access Project background (cont.) RTD (retrospective theses and dissertations) Number: 3,802 Where: Archives + Library basement Condition: most in usable condition, but… Access Project Background (cont.) Purposes Centralize: ETD & RTD Improve access, search and retrieval Support teaching, learning and research Improve preservation Project Background (cont.) Consideration Format Copyright Privacy Equipment Bookdrive DIY Disclosure Not currently or previously an employee of the corporations whose products I discuss I am not compensated for my comments or opinions Older software version being used Capture New Book window Capture in action Batch entry Irfanview GIMP Open source equivalent to Photoshop Batch processing requires additional plugin Supervisor unfamiliarity Photoshop Can record action to perform batch processing Graphical interface while setting up recorded action Changing DPI Color Grayscale B/W PDF Compression All items being converted are compressed Some formats compress better than others Compression artifacts can also become visible Original image of page is visible Searchable text layer is hidden First Review All pages present? All text legible? No shadows covering text? Page in focus? Essential color elements retained? PDF/a Copy saved to Archives server Only accessible to staff Final Review and cleanup Review metadata Correct if necessary Approve and publish Remove original camera images, processed images, and extra copies of pdf Workflow Imaging original theses or dissertations Workflow (cont.) Processing image files Workflow (cont.) Converting to PDF/A Workflow (cont.) Publishing on ISU IR Outcomes Volume finished: 848 Average volume size: 96 pages Average student time: 1.3 hours Average supervisor time: 5-10 minutes Average file size: 5.5 MB Total Disk Space: 4.6 GB Approximate cost: $15-18 Worth It? Centralize Improve access Via digital repository Search engines Digital repository registries WorldCat Worth it? (cont.) Support teaching, learning and research Improve preservation strategies Multiple digital copies Backup Bitstream preservation Distributed preservation network via MetaArchive Cooperative Lesson learned Control quality: monochrome and grayscale Supervise students Add MARC 856 field Secure continued funds Conclusion Complex Various issues Funding Technical standards Quality control Format selection In-house vs. outsourcing Metadata Delivery Preservation Rights management Workflow development Contact info Xiaocan (Lucy) Wang Xiaocan.wang@indstate.edu Eric Holt Eric.Holt@indstate.edu