Digital libraries - NAL's Institutional Repository

advertisement
DIGITAL LIBRARIES: AN OVERVIEW
Dr. I.R.N. Goudar
Head, ICAST
National Aerospace Laboratories
Bangalore – 560017
goudar@css.nal.res.in
One day Seminar on
Digital Library Services for Technical Colleges
Basaveshwar Engineering College
Bagalkot – 587102
15 April 2006
Traditional Libraries

Libraries with the same purpose,
functions, and goals






Collection development and management
Technical Processing
Index creation
Counter Transactions
Reference work
Preservation
What is Digital library ?







A Service? An Architecture?
A set of Information Resources?
A set of tools to locate, search, retrieve
information?
Possibly the tools to create such resources and
services also fall within the purview of DLs
Digital face of traditional libraries
Include both digital collections and traditional
Backbone and nervous system of libraries.
Defining the Digital Library
“A digital library service is an assemblage of digital computing,
storage, and communications machinery together with the
software needed to reproduce, emulate, and extend the
services provided by conventional libraries based on paper and
other material means of collecting, storing, cataloguing, finding,
and disseminating information.” (Gladney H.M, et. al. 1994)
“Digital Libraries are a set of electronic resources and associated
technical capabilities for creating, searching,and using
information…they are an extension and enhancements of
information storage and retrieval systems that manipulate
digital data in any medium (text, images, sounds,static or
dynamic images) and exist in distributed networks” (Borgman,
1996)
What is Digital library ?

Borgman identifies two major aspects


DL researchers from Computer Science focus on
content for user communities and therefore
emphasize the enabling technologies
Library professionals appear to emphasize DLs as
services
However require both the skills of librarians as
well as those of computer scientists
What is important?

Site Neutrality
Access-Anytime (24*7)
Anywhere (Office, Residence, Travel)
By Anyone






Open Access and Sharing of information
Greater variety and granularity of information
Up-to-date ness
New forms of rendering ( New Genre)
Integration of digital media into traditional collections
Digital libraries are different in that they are designed to support
the creation, maintenance, management, access to, and
preservation of digital content
Five Elements in Various Definitions of DL





The digital library is not a single entity;
The digital library requires technology to link the resources
of many;
The linkages between the many digital libraries and
information services are transparent to the end users;
Universal access to digital libraries and information services
is a goal;
Digital library collections are not limited to document
surrogates: they extend to digital artefacts that cannot be
represented or distributed in printed formats.
Association of Research Libraries (1995)
Goals of DL


Focused on digitization technology, metadata
schemes, data management techniques, and
digital preservation.
Second-generation digital library


exploring new opportunities and developing new
competencies.
Third-generation digital library

focusing instead on fully integrating digital material
into the library’s collections through a modular
systems architecture.
Digital Libraries Shorten the Chain
Reviewer
AUTHOR
Editor
Publisher
A&I
DIGITAL
LIBRARY
USER
Consolidator
Library
ROLES
READER
AUTHOR
LIBRARIAN
EDITOR
LEARNER
TEACHER
Ingredients for DLs

Hardware
The minimum machinery to do the job

Software
The programs for handling data

Digital Objects
Articles, Conference Papers, Thesis,……

Basic Skills
Things one has to learn
Hardware

A Server



You’ll need access to a web server
A good PC
Scanners
Flatbed – Auto feed, Back to back
MF
Book Scanner
Software

Open Source Software (OSS)
Dspace, E-Prints, Fedora, GSDL……

Proprietary software you can’t avoid
Image Editing and Optical Character Recognition
Software have to be purchased
Hardware- Software Network





High-speed local networks and fast
connections to the Internet
Relational databases that support a variety of
digital formats
Full text search engines to index and provide
access to resources
Web servers and FTP servers (both intranet
and internet)
Electronic document management functions
Digital Library Content
Content Types
Text
Documents
Articles
Reports
Books
Manuscripts
News Papers
Theses
Tech. Reports
Video
Audio
Speech
Music
Movies
Geographic
Information
(Aerial)
Photos
Software,
Programs
Genome
Human
Animal
Plant
Bio
Information
Images
and
Graphics
Photographs
Models
Simulations
Paintings
2D
3D
Content is King
The information content is
more important than the
systems used for its storage,
management and retrieval
Objects should not be “locked”
in specific DLs or archives
Types of Digital Collections
Digitization


Converting paper and other media in existing
collections to digital form
Acquisition of original digital works


Created by publishers and scholars like
electronic books, journals, and datasets
Access to external materials


Like Web sites, other library collections, or
publishers' servers
Resources








Bibliographic databases that point to both
paper and digital materials
Indexes and finding tools
Collections of pointers to Internet resources
Directories
Teaching and pedagogic materials
Photographs
Numerical data sets
E-books and e-journals
Creating DLs …

Six steps

Selecting
Acquiring
Digitization
Organizing
Archiving

Providing Access




Process
Selection of
Books
Identification of
Books
Meta Data
Scanning
Process
Scanning
Image
Processing & QC
OCR
Publishing
Digitization
“Conversion of any fixed or analogue
media--such as books, journal articles,
photos, paintings, microforms--into
electronic form through scanning,
sampling, or in fact even re-keying.”
Digitization Process ….


Determine copyright or restrictions
Digital conversion
 Outsource or in house?
 Text conversion, formats, headers,
compression, and delivery media
 Digital capture with camera or scanner ?
 File handling
 File naming
Digitization Process







Preparing the objects
Scanning
Moving files to temporary storage
Value addition- metadata preparation
etc
Long term storage
Derivative image/thumb nail for
access copy
Merging files
Digital Production Process
Data
Workflow Content
Project
Quality
ManagementManagement
Management
Management
Management
Supplier Management
Data Management
Data
Management
Workflow
Content
Management Management
Supplier Management
• Formats: TEX, PDF,PS
• Metadata and content data
• Structuring (Tagging)
• Media neutrality
Project
Management
Quality
Management
Workflow Management
Data
Management
Workflow
Content
Management Management
Supplier Management
• Processing
• Conversion
• Automatization
• Interfaces - input / output
Project
Management
Quality
Management
Content Management
Data
Management
Workflow
Content
Management Management
Supplier Management
• Style files
• Information /Object models
• Archiving
Project
Management
Quality
Management
Quality Management
Data
Management
Workflow
Content
Management Management
Supplier Management
• Data consistency
• Process consistency
• Content consistency
Project
Management
Quality
Management
Various Workflows
Input
Processing
RTF
TeX
Camera ready
Output
Books
Normalization
Content
Processing
Archive Journals
Software
OCR: Optical Character Recognition

On the market are many good OCR programs,
with prices ranging from Rs. 5000 to Rs.20,000.
For example, among many others are:
Read-Iris (http://www.readiris.com/)
Omnipage (http://www.omnipage.com/)
Fine-Reader (http://www.finereader.com/)
Possible Delivery Formats




Pure image formats: TIFF, JPEG
Open encoded formats: XML, HTML,
ASCII, and Unicode
Hybrid formats: PDF, DjVu – can contain
both image and text
Proprietary formats: Microsoft Word,
WordPerfect
Good Principles

What to digitize?


Selection and policy is important
Collection description is important

such as scope, format, restrictions on
access, ownership etc
Digitization: Issues





Copyright
Access copy and archive copy
File size
Storage media( CD, Hard disc…)
File format ( TIFF,JPEG…)
Challenges in Publishing

Preservation of layout

Searchability of content and metadata

Efficient image compression

Easy browsing of books

Accommodating low bandwidth user

Multilingual text support

Multipaging
Digitization .. Factors

Collection strengths



Unique collections


what is reasonable for any one institution to collect or digitize
Technical architecture


Like demands of a curriculum
Manageable portions of collections


only copies of something
Priorities of user communities


digitizing selected portions
adding new digital works
also be factor in selecting who digitizes what
Skills of staff

whose staff don't have the necessary skills
Retrospective Conversion




Complete conversion would be impractical or impossible
technically, legally, and economically
Digitization of a particular special collection or a portion of
one
 which is highly valued
Highlight a diverse collection
High-use materials
Approaches can be objects used alone or in combination
depending upon a particular institution's goals
Criteria for Selecting Content




Their potential for long-term use
Their intellectual or cultural value
Whether they provide greater access
than possible with original materials
(e.g., fragile, rare materials)
Whether copyright restrictions or
licensing will permit conversion.
Metadata



The data that describes the content and attributes of
any particular item
Key to resource discovery and use of any document
Facilitate searching and discovery, as well as
administrative and structural metadata to assist in
object viewing,management, and preservation.
Elements of Dublin Core







Title
Creator
Subject and
Keywords
Description
Publisher
Contributor
Date








Format
Resource Identifier
Resource Type
Source
Language
Relation
Coverage
Rights Management
Barriers



Digital objects are less fixed, easily copied, and
remotely accessible by multiple users simultaneously
Libraries mostly are simply caretakers of information,
own the copyright of the material with restrictions
To develop mechanisms for managing copyright,
mechanisms that allow them to provide information
without violating copyright, called rights management
Rights Management




Usage tracking
Identifying and authenticating users
Providing the copyright status of each digital
object, and the restrictions on its use or the
fees associated with it
Handling transactions with users by allowing
only so many copies to be accessed, or by
charging them for a copy, or by passing the
request on to a publisher
Preservation


Keeping digital information available in
perpetuity
Real issue is technical obsolescence


Like the deterioration of paper in the paper
age
Constantly coming up with new technical
solutions
Three Types of Preservation



Preservation of the storage medium
Preservation of access to content
Preservation of fixed-media materials
through digital technology
Preservation of the Storage Medium




Tapes, hard drives, and floppy discs have a
very short life span
Obsolete anywhere from two to five years
before they are replaced by better technology
Possibility of non-availability of the hardware
or software to read them
May have to keep moving digital information
from storage medium to storage medium
Preservation of Access



Access to the content of documents, regardless
of their format:
- When the formats (e.g., Adobe Acrobat PDF)
containing the information become obsolete
- Translate data from one format to another for
preserving the ability of users to retrieve and
display the information content
- Data migration is costly
Still no standards for data migration
Distortion or information loss
Fixed-media through Digital Technology



Replacement for current preservation media
 such as microforms
No common standards for the use of digital media as
a preservation medium
It is unclear whether digital media are going handle
the task of long-term preservation
Digital Libraries Benefits : Individual
Gain access to the holdings of libraries worldwide
through automated catalogs. Locate both physical
and digitized versions of scholarly articles and books.

Optimize searches, simultaneously search the
Internet, commercial databases, and library
collections.

Save search results and conduct additional
processing to narrow or qualify results.

From search results, click through to access the
digitized content or locate additional items of interest.
All of these capabilities are available from the
desktop or other Web-enabled device such as a
personal digital assistant or cellular telephone.

Digital Libraries Benefits : Classroom
Projects





Capability to enhance the classroom
experience or conduct learning apart from a
physical campus
Digital library is a core component of this VLE
Changing the relationships between the
library and other parts of the academic
enterprise
Integrate authoring, analysis, and distribution
tools that facilitate the reuse and repurposing
of digital content
Collections and services can be integrated
into the institutional, national, and worldwide
Digital Library Standards



Common User Interface:
Data Handling and Interchange:
Graphic Formats – JPEG, TIFF, GIF, PNG, Group 4 Fax, CGM
Structured Documents – SGML, HTML, XML
Moving Pictures/3-D – MPEG, AVI, GIF89A, QuickTime, Real
Video, ViviActive, VRML
Metadata:
Resource Description – Dublin Core, WHOIS++ Templates,
US-MARC, TEI Headers, Other Open Source and Domain
Specific Standards.
Resource Identification – URN, PURL, DOI, SICI
Security, Authentication and payment services:
Emerging e-Commerce Standards.
Indian DL Initiatives: Contents









Books (out of copyright)
Scholarly Journals
Theses
Institutional E-Prints
Manuscripts
Data
News Papers
Metadata Level
Portal and Gateway Services
Government, Judicial, Financial, Land Records



Ministry websites include policy and planning
documents, annual reports, budget etc.
Goa, Andhra Pradesh, Karnataka,
Maharashtra, Tamil Nadu have made significant
headway
Judgments of Supreme Court and High Courts
covered
Digital Library of India at IISc, Bangalore
• Mission: Free access to human knowledge through Portal
• Objectives:
To capture all books in digital format (1 m by 2005)
Test bed for improved Scanning Techniques,OCR, Indexing
 Books, Journals, Palm Leaves
 > 1L books in English, Telugu, Kannada, Tamil, Sanskrit, Urdu

100 Scanners in 16 scanning centres

Plan for 1 m documents by 2005
 Science, arts, culture, music, movies, traditional medicine
 Will be mirrored at Several location in the world
 Collaboration: Universal Library Project, CMU
http://www.dli.ernet.in/
IISc: Other Activities
Vigyan:
Website on Indian S & T
Collaboration with NISSAT/DSIR
 Indo-French Cyber University
Initially PG in Applied Mathematics
 E-prints at IISC by NCSI (http://eprints.iisc.ernet.in/)
 Online digital repository of IISc research papers
Research papers (preprints, post-prints), book chapters, tech reports,
unpublished findings, conf papers, magazine articles
 Set up using e-prints.org open source software
 Part of worldwide institutional e-print archives
Institutional Repositories






Indian Institute of Science
National Aerospace Laboratories
National Chemical Laboratories
National Institute of Oceanography
ISI – Mathematics
DRTC- LDL
 Raman Research Institute
 IIM Kozikode
Scholarly Science Journals
 Indian Academy of Sciences (IAS) –11 Journals
 Indian National Science Academy – 4 journals
 Indian Medlars Centre (IndMed) – 22 journals
Vidyanidhi: Dept. Of LIS, Univ. of Mysore
 Digital Library and E-Scholarship Portal
 Indian Theses Database
 Indian ETD Collection
 Training Program for improving quality
 Supported by DSIR, GOI
 Part of global ETD initiative
 Support by Ford Foundation and Microsoft
http://www.vidyanidhi.org.in/
Theses initiatives by others: IITs
Delhi and IIT, Mumbai
Indira Gandhi National Centre for the Arts
 Digital Images
 Electronic Books
 Video Recordings
 Papers and Essays
 Audio Recordings
 Research Reports
 Databases
 News Letters
 Bibliographies
 Conference Proceedings
 Multimedia Documentation
 Manuscripts in India
 Kalakalpa (Journal)
 In house Articles
Mumbai Asiatic Society
• Rare Books (as back as 1632)
• Manuscripts (Sanskrit, Pali,
Tibetan, Prakrit, Arabic,
Persian,etc.)
• Maps, Coins
• Buddhist Relics
• Book preservation laboratories
Microfilming – Now digitisation
http://education.vsnl.com/asbl/treasure.html
National Library, Calcutta
Manuscripts (Work in Progress)
Paper –3000, Palm Leaf-334
Books (<1900, Indian <1920)
6600 Titles, 2.5 M pages, 548 CDs
Bengali Journal (Prabasi)
East India Company Records
Many Diaries
Orunudoi
Assamese
In Bengali
Journal
Archives of Indian Labour
V.V. Giri National Labour Institute
Heritage of Indian Working Class
 Commissions on Labour
 Oral History Collections
 Trade Union Collections
 Regional Collections
 Strike Collections
http://www.indialabourarchives.org/
India: DL Issues
 Objects Identifications: Non-availability, Coordinated
efforts
 Technology and Infrastructure
 Standards: Meta data
 Funds
 Networking of minds
 Multiple languages
 Inhibition and Reservations (libraries and heritage
materials)
 IPR
DLI in India: Suggestions








Distributed National Network with Global Access
 Institutional digital repositories
 Open access science journals
 Content based catalogue (metadata)
 Portal giving links to various activities
Intensive Training of Librarians, Archeologists,
Curators, etc.
Improve Technology and Infrastructure
Adopt Suitable Standards
Language Tools
Modify IPR to suit Open Archive
Enhancement Capability of Integrated Lib Auto System
to handle DL Features
Compilation of Directory of DL Technologies and Vendors
Bibliography on Digital Libraries
http://sunsite.berkeley.edu/C
urrentCites/bibondemand.cgi?
query=digital+library
Download