Capacity Building (Noha Adly, Bibliotheca Alexandrina)

advertisement
Capacity Building
Passing on the Experience
World Digital Library Arab Peninsula Regional Group meeting
Dr. Noha Adly
Reaching significant milestones in
digitization
With a focus on Arabic content
It all started at the BA Digital Lab…
The Digital Laboratory
Well equipped for different types of media
Digital Laboratory
• Digitizing various media including slides in multi-formats,
negatives, books, manuscripts, pictures and maps
• Digitizing Bibliotheca Alexandrina’s valuable collections
• Many of the Library’s projects are highly dependant on the
digital laboratory
Digital Lab Man Power
• 120 staff members
•
•
•
•
Distributed over several teams
Working 7 days / week
2 shifts / day
Working in many collections simultaneously
Workflow & Workflow Management system
are essential to control and track the process
What is a Workflow ?
• A workflow is a well defined sequence of operations, declared as
work of a [resource]* during which documents, information or
tasks are passed from one resource to another for action
– According to a defined procedural rules
– Having an estimated time
– Can be documented
– Can be learned
*Resource: is a person, simple or complex mechanism, group of persons, an organization of staff, or machines
Basic Digitization Workflow
Digitization
Phase
Processing
Phase
OCR Phase
“Scanning”
Hardcopy is
converted into raw
digital image
Raw digital image is enhanced
to realize:
•Better image quality
• Better OCR accuracy
It extracts the text
corresponding to the
processed image
contents
Basic Digitization Workflow
For each phase, we need to:
• Define the specs of the output (Quality)
• Set the procedure of work to guarantee quality
• Calculate the required time
• Whenever possible try to Automate tasks
• Set Benchmarks to monitor the progress
Why Workflow Management System?
1.
2.
3.
4.
Automation of task handling
Progress tracking
Process Management
Flexibility
Digital Assets Factory DAF
(DAF is the digitization workflow management system)
1. Automation of task handling
Digital Assets Factory DAF
(DAF is the digitization workflow management system)
2.Progress tracking
–
–
–
–
–
Workflow Tracking
Pending Items
Late Jobs
Employee’s Rates
Build Customized Report
Digital Assets Factory DAF
(DAF is the digitization workflow management system)
3. Process Management
–
–
–
–
–
–
–
Roles (Permissions)
Job Types
General Settings
Phases
Employee accounts
Workstations
Collections
Digital Assets Factory DAF
(DAF is the digitization workflow management system)
4. Flexibility
Arabic Books
Scanning
Arabic Books
Processing
Arabic Books
OCRing
Arabic Books
Encoding &
Publishing
Arabic Books
QA
Arabic Books
Archiving
Targeted Monthly Production Rate
≈ 5,000 books/month (1,800,000 pages)
HOW to reach the target?
Daily Rates (single shift)
– Scanning: ≈ 3,000 pages/person
– Processing: ≈ 3,000 pages/person
– Latin OCR: ≈ 4,000 pages/person
– Arabic OCR: ≈ 2,100 pages/person
Monitoring
• Rate/user (monitored during the shift)
• User rate & Rate/shift report
Reporting
• Weekly production
• Monthly production
BA’s digital collections are maintained within the
institution’s Digital Assets Repository - DAR
Digital Assets Repository
• Developed to facilitate the creation, use
management of the digital library collections.
and
• A repository for all types of digital material including
slides in multi formats, negatives, books, manuscripts,
pictures and maps, audio and video, thus preserving and
archiving the digital media
• Provides public access to digitized collections through
a web-based search and browsing facilities
Digital Assets Repository
• DAR’s core consists of 4 fundamental modules:
– The Digital Assets Factory (DAF) )
http://wiki.bibalex.org/DAFWiki
• Responsible for the complete automation of the digitization cycle
• It was developed using open source tools
– The Digital Assets Metadata (DAM)
• Keeps a unique and intact version of the digital assets’ metadata
• Helps ensuring that cataloging, indexing, browsing, searching and retrieval are done
efficiently
• In the latest version, DAM uses Fedora to manage the metadata.
• Based on METS/MODS standards
– The Digital Assets Keeper (DAK)
• A repository for the digital assets that are either produced by DAF or are directly
introduced into the repository.
– Digital Assets Publishers (DAP)
• Components that publish and display the digital assets stored in DAK
– Book viewers
– Search engines
Imparting Capacity Building
Sharing the BA’s technical expertise with external
organizations
ISIS has conducted capacity building workshops:
• Yale University
December 2007
Arabic and Middle Eastern Electronic Library
• Municipal Administration Modernization
(MAM) program in Syria March 2009
• Kuwait Institute for Science and Research
“KISR”
January 2010
Capacity Building Scope
Passing on the experience of building an institutional
repository to maintain the production of high quality
digital assets in terms of digitizing, processing,
OCRing, encoding, archiving and publishing based on
well known standards.
Capacity Building Program
The capacity building program
• Overviewing BA/ICT facilities
(Digital Library, Internet Archive, VISTA, HPC, System
infrastructure design, etc.)
The capacity building program
• General tour over viewing BA/ICT facilities
• Digitization process
– Digital image parameters
– Compression formats
– Digitization workflow and phases
The capacity building program
• General tour over viewing BA/ICT facilities
• Digitization process
• Hands on Scanning and Image processing
– Enhancing image and text quality
– Images rendering a good OCR
The capacity building program
•
•
•
•
General tour over viewing BA/ICT facilities
Digitization process
Hands on Scanning and Image processing
Quality Assurance
The capacity building program
•
•
•
•
•
General tour over viewing BA/ICT facilities
Digitization process
Hands on Scanning and Image processing
Quality Assurance
Digital Assets Factory (DAF)
– Automation of the digitization workflow
– DAF key features
– Job life cycle
The capacity building program
•
•
•
•
•
•
General tour over viewing BA/ICT facilities
Digitization process
Hands on Scanning and Image processing
Quality Assurance
Digital Assets Factory (DAF)
OCR
– Analysis of the input and classifying it to different
fonts
– Automating OCR procedure
The capacity building program
•
•
•
•
•
•
•
General tour over viewing BA/ICT facilities
Digitization process
Hands on Scanning and Image processing
Quality Assurance
Digital Assets Factory (DAF)
OCR
Online Storage
The capacity building program
•
•
•
•
•
•
•
•
General tour over viewing BA/ICT facilities
Digitization process
Hands on Scanning and Image processing
Quality Assurance
Digital Assets Factory (DAF)
OCR
Online Storage
Library Services
–
–
–
–
VTLS including its different modules
LIS servers and DB maintenance
OPAC and WEBAC customization
In-house developed systems
The capacity building program
•
•
•
•
•
•
•
•
•
General tour over viewing BA/ICT facilities
Digitization process
Hands on Scanning and Image processing
Quality Assurance
Digital Assets Factory (DAF)
OCR
Online Storage
Library Services
Multimedia delivery framework
Disseminating knowledge in the digital
age…
Thank You
Download