Capacity Building Passing on the Experience World Digital Library Arab Peninsula Regional Group meeting Dr. Noha Adly Reaching significant milestones in digitization With a focus on Arabic content It all started at the BA Digital Lab… The Digital Laboratory Well equipped for different types of media Digital Laboratory • Digitizing various media including slides in multi-formats, negatives, books, manuscripts, pictures and maps • Digitizing Bibliotheca Alexandrina’s valuable collections • Many of the Library’s projects are highly dependant on the digital laboratory Digital Lab Man Power • 120 staff members • • • • Distributed over several teams Working 7 days / week 2 shifts / day Working in many collections simultaneously Workflow & Workflow Management system are essential to control and track the process What is a Workflow ? • A workflow is a well defined sequence of operations, declared as work of a [resource]* during which documents, information or tasks are passed from one resource to another for action – According to a defined procedural rules – Having an estimated time – Can be documented – Can be learned *Resource: is a person, simple or complex mechanism, group of persons, an organization of staff, or machines Basic Digitization Workflow Digitization Phase Processing Phase OCR Phase “Scanning” Hardcopy is converted into raw digital image Raw digital image is enhanced to realize: •Better image quality • Better OCR accuracy It extracts the text corresponding to the processed image contents Basic Digitization Workflow For each phase, we need to: • Define the specs of the output (Quality) • Set the procedure of work to guarantee quality • Calculate the required time • Whenever possible try to Automate tasks • Set Benchmarks to monitor the progress Why Workflow Management System? 1. 2. 3. 4. Automation of task handling Progress tracking Process Management Flexibility Digital Assets Factory DAF (DAF is the digitization workflow management system) 1. Automation of task handling Digital Assets Factory DAF (DAF is the digitization workflow management system) 2.Progress tracking – – – – – Workflow Tracking Pending Items Late Jobs Employee’s Rates Build Customized Report Digital Assets Factory DAF (DAF is the digitization workflow management system) 3. Process Management – – – – – – – Roles (Permissions) Job Types General Settings Phases Employee accounts Workstations Collections Digital Assets Factory DAF (DAF is the digitization workflow management system) 4. Flexibility Arabic Books Scanning Arabic Books Processing Arabic Books OCRing Arabic Books Encoding & Publishing Arabic Books QA Arabic Books Archiving Targeted Monthly Production Rate ≈ 5,000 books/month (1,800,000 pages) HOW to reach the target? Daily Rates (single shift) – Scanning: ≈ 3,000 pages/person – Processing: ≈ 3,000 pages/person – Latin OCR: ≈ 4,000 pages/person – Arabic OCR: ≈ 2,100 pages/person Monitoring • Rate/user (monitored during the shift) • User rate & Rate/shift report Reporting • Weekly production • Monthly production BA’s digital collections are maintained within the institution’s Digital Assets Repository - DAR Digital Assets Repository • Developed to facilitate the creation, use management of the digital library collections. and • A repository for all types of digital material including slides in multi formats, negatives, books, manuscripts, pictures and maps, audio and video, thus preserving and archiving the digital media • Provides public access to digitized collections through a web-based search and browsing facilities Digital Assets Repository • DAR’s core consists of 4 fundamental modules: – The Digital Assets Factory (DAF) ) http://wiki.bibalex.org/DAFWiki • Responsible for the complete automation of the digitization cycle • It was developed using open source tools – The Digital Assets Metadata (DAM) • Keeps a unique and intact version of the digital assets’ metadata • Helps ensuring that cataloging, indexing, browsing, searching and retrieval are done efficiently • In the latest version, DAM uses Fedora to manage the metadata. • Based on METS/MODS standards – The Digital Assets Keeper (DAK) • A repository for the digital assets that are either produced by DAF or are directly introduced into the repository. – Digital Assets Publishers (DAP) • Components that publish and display the digital assets stored in DAK – Book viewers – Search engines Imparting Capacity Building Sharing the BA’s technical expertise with external organizations ISIS has conducted capacity building workshops: • Yale University December 2007 Arabic and Middle Eastern Electronic Library • Municipal Administration Modernization (MAM) program in Syria March 2009 • Kuwait Institute for Science and Research “KISR” January 2010 Capacity Building Scope Passing on the experience of building an institutional repository to maintain the production of high quality digital assets in terms of digitizing, processing, OCRing, encoding, archiving and publishing based on well known standards. Capacity Building Program The capacity building program • Overviewing BA/ICT facilities (Digital Library, Internet Archive, VISTA, HPC, System infrastructure design, etc.) The capacity building program • General tour over viewing BA/ICT facilities • Digitization process – Digital image parameters – Compression formats – Digitization workflow and phases The capacity building program • General tour over viewing BA/ICT facilities • Digitization process • Hands on Scanning and Image processing – Enhancing image and text quality – Images rendering a good OCR The capacity building program • • • • General tour over viewing BA/ICT facilities Digitization process Hands on Scanning and Image processing Quality Assurance The capacity building program • • • • • General tour over viewing BA/ICT facilities Digitization process Hands on Scanning and Image processing Quality Assurance Digital Assets Factory (DAF) – Automation of the digitization workflow – DAF key features – Job life cycle The capacity building program • • • • • • General tour over viewing BA/ICT facilities Digitization process Hands on Scanning and Image processing Quality Assurance Digital Assets Factory (DAF) OCR – Analysis of the input and classifying it to different fonts – Automating OCR procedure The capacity building program • • • • • • • General tour over viewing BA/ICT facilities Digitization process Hands on Scanning and Image processing Quality Assurance Digital Assets Factory (DAF) OCR Online Storage The capacity building program • • • • • • • • General tour over viewing BA/ICT facilities Digitization process Hands on Scanning and Image processing Quality Assurance Digital Assets Factory (DAF) OCR Online Storage Library Services – – – – VTLS including its different modules LIS servers and DB maintenance OPAC and WEBAC customization In-house developed systems The capacity building program • • • • • • • • • General tour over viewing BA/ICT facilities Digitization process Hands on Scanning and Image processing Quality Assurance Digital Assets Factory (DAF) OCR Online Storage Library Services Multimedia delivery framework Disseminating knowledge in the digital age… Thank You