developing digital libraries

advertisement
Developing Digital Libraries: Technologies and Challenges
Dr. Mohan Raj Pradhan
Dept. of Lib. & Inf. Sc.
Tribhuvan University
Kathmandu, Nepal
Email: mpradhan@healthnet.org.np
Abstract
It discusses the basics of developing a digital library and explains the new concepts
underlying the digital library development procedures regarding technologies and
managerial skills. Measures are needed to overcome then problems of computer viruses
and also unauthorised use. Initial investment in digital libraries is high, as is maintenance,
it is therefore essential to explore the new sources of fund.
Keywords
Libraries, Information technology, Internet, Developing countries, Electronic data
interchange, Funding
1 Introduction
Digital library (DL) is a new concept. The concept has brought a phenomenal change in
the information collection, preservation and dissemination scene of the world. For the
profession of librarianship, this turn of the events is a blessing in disguise. The concept
has entered the scene at a time when the traditional library is facing a fadeout and fast
losing its value against the powerful growth of Internet and virtual library. But Internet is
just a tool facilitating the access to available information. It has missed a very significant
societal role in the preservation and diffusion of human knowledge through ages and
generations of human society. Also Internet just provides everything that is published,
regardless of the quality, authenticity or reliability.
At this point, the time tested, good old profession of librarianship has entered into the
technology with its centuries-old techniques of selecting and acquisition of quality
objects of human knowledge (in this case digital documents), performing subject analysis
and cataloguing activities (in this case metadata definition) and organizing them into
searchable collections accessible via web (called digital libraries) and preserving them for
future use.
2 Development of the DL concept
Digital libraries began to appear in the early 1990s as research and development projects,
centered within computer science departments of universities sometimes funded by
government grants. As these projects matured than Information Technology (IT) groups
began to partner with the library to develop campus-wide standards for the operation of
digital libraries as a part of the education enterprise.
With introduction of Internet and Intranet, it was no longer considered practical to move
to a practical to access information sources during a particular time of a day. It is due to
time value of information, users expected instant access to information, from any
location, at any time. Digital Library in its current shape is an attempt to fulfill this
objective.
3 Defining the Digital Library
In the literature, there are many definition of the term of digital library. In this article, choice is
made to use a most logical definition from the viewpoint of librarians, which was proposed by
the American Digital Library Federation, 1998:
“Digital libraries are organizations that provide the resources, including the specialised staff, to
select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and
enusure the persistence over time of collections of digital works, so that they are readily and
economically available for use by a defined community or set of computers.”
Based on the above definition, Cleveland (1998) gave some of its characteristics. One of these
characteristics is:
“ Digital libraries are the digital face of traditional libraries and include both electronic (digital)
as well as print and other (e.g. film, sound) materials.
Cleveland (1998) also thought that: “In reality, digital libraries will not be a single, complete
digital system that allows users to promptly access all information, for all disciplines, from
anywhere around the world. Instead, they will most likely to be a collection of disparate
resources and disparate systems, catering to specific communities and user groups, created for
specific purposes. They will also include perhaps indefinitely, paper-based collections”
Sharma and Vishwanathan (2001) said that “Growth of digital libraries involves
digitisation of existing library materials; connectivity to the users in the world online and
offline; integration with networking; and availability on the World Wide Web”
4 Key components
A fully developed digital library environment involves the following elements:
a. Digital documents (Both digital or by conversion of content to digital form).
b. The extraction or creation of metadata or indexing information describing the
content to facilitate searching as well as administrative and structural metadata to
assist in object viewing, management and preservation,
c. Storage of digital content and metadata in an appropriate multimedia repository. This
will meet the requirement of intellectual property rights.
d. Client services for the browser, including repository querying and workflow:
e. Content delivery via file transfer or streaming media;
f. User access through a browser or dedicated client and
g. A private or public computer network.
5 Developing Digital Libraries—Team Approach
Very few people will have all the skills required to construct a digital library. Most of the
skills are too specialized for the librarians or any other layman to acquire. Therefore,
digital library development projects are very much a team effort. The skill set of a
typical Digital Library team may be as follows:
 Technical skill (knowledge of IT hardware/software);
 Project management;
 Database development;
 Cataloging (Meta-data);
 Computer programming;
 Web designing subject specialists;
 Preservation (document formats and long-term storage media);
 Photography;
 Graphic design/digitization skills; and
 Volunteer/student help
6 Digital Archiving in the Framework of Information Life Cycle
Management
The framework of information life cycle consists of: creation, acquisition, cataloguing/
identification, preservation and access. A brief description of each one is given as
follows:
6.1 DLs- Creation
Building digital libraries begins with creating digital content and collections. Creation is
the act of producing the information product. The creator may be human author or
originator, or a piece of equipment such as a sensing device, satellite or laboratory
instrument.
Several key practices are being involved in the archiving projects. First, the creator may
be involved in assessing the long-term value of the information.
Secondly, the preservation and archiving process is made more efficient when attention is
paid to issues of consistency, format standardization and metadata description in the very
beginning of the information life cycle. Limits are placed on both the software that can be
used and on the format and layout of the documents in order to make short and long-term
information management easier.
6.2.1 The digital content may be
a. Born Digital. These materials are from the beginning in electronic format as as an
originating source.
b. Digitized. It means materials are converted to a digital format from an initial analog
form.
6.2.2 Digitization
Digitization is the process of creating digital files by scanning or converting analogue materials.
The technology used for digitization of analog objects is called scanning and the equipment is
called scanner. Scanners are imaging technology or the OCR technology for digitizing.
Imaging process creates photo image of the paper document and later on may be converted to
PDF format or can be saved as Jpeg or bit map image as per policy of standard adopted for
digitization. The image can be read only and cannot be edited.
The OCR technology allows converting the scanned image to the electronic format able to
format either in the form of a plain text or word processor document format, ready for editing.
6.2 DLs- Content Selection and Acquisition
Content selection and acquisition is the stage in which the created object is
“incorporated” physically or virtually in the archive. The object must be known to the
archive administration. There are two main aspects to the selection and acquisition of
digital objects – content selection policies and acquisition procedures.
6.2.1 DLs – Content Selection policies
The type, size and format of the digital content selected for a DL is the main factor which
dictates the need for the technological requirements, the hardware/software and IT
capabilities for the future.
One should be very clear about what content will constitute the DL? Would the DL’s
content constitute only the internal document of an organization or external documents
will also be included. If external documents were included would there be free or
acquired from commercial vendors. If purchased from commercial vendors, would there
be on-going expense or one time expense. For example, subscribing to e-journal involves
on-going expense.
If electronic resources were purchased from commercial sources, would it be own by the
DL after purchase or just an access to the electronic resources through user name and
password or through IP address. If a document is purchased in electronic format, would
the print format will be continued or not? In both the cases, utility for present time and
future should be measured.
Regarding the access to electronic resources through license considerations has to be
made on two aspects: number of users and Internet speed. License cost may differ with
the number of users. Accessing the electronic resources through license need Internet
access. The speed of Internet access is very expensive in developing countries. One
should calculate the on-going cost of Internet speed. Here, if the electronic resources are
purchased in CD-format, the costly Internet accesses need not to be paid. One should
calculate the pros and cons of accessing electronic resources through Internet access or
via through CD-server or local computer.
Another consideration, which has to be made, is how the users will use it? Would the
access be free of cost or paid? What technology would be required for accessing DL? Are
the infrastructure and manpower available for accessing DL? Do the users know using it
or training is required for accessing electronic resources.
6.3 DLs- Content Acquisition
Just like the print document, there will be an ever increase in the volume of e-documents.
The acquisition and ongoing loading of e-documents would be a regular routine.
Consideration should be made whether the loading will be centralised or distributed. If
the loading is distributed than consideration should be made regarding loading capability,
firewall for security and manpower available.
There should be compatible software for accessing electronic resources. If conversion is
needed for access, (e.g. from text to HTML or PDF), it should be made sure that the
hardware and software for conversion is available.
Consideration should also be made, if the acquisition is made through archived links.
Once acquisition is made through archived links, there should be a policy regarding
refreshing the archived link, gathering approaches and determining the extent.
6.4 DLs- Identification and Cataloguing
Once the document has acquired the digital object, it is necessary to identify and catalog
it. Both identification and cataloguing allow the archiving organization to manage the
digital objects over time. Identification provides a unique key for finding the object and
linking that object to other related objects. Cataloguing in the form of metadata supports
organization and access.
6.4.1 Metadata
One of the most challenging aspects of the digital environment is the
identification of resources available on the web as well as in the digital
repositories. The existence of searchable descriptive data increases the chance of
accessing the archived digital object for use.
Metadata is defined as “data about data” or “information about information”. It is
the information, which describes significant aspects of a digital resource. Most
discussion to date has tended to emphasize metadata for the purposes of resource
discovery. Examples of metadata systems include library catalogues, archival
finding aids, and museums inventory control or register systems. Over the years,
metadata formats have been developed for a wide range of digital objects. Within
this range of formats, there is a degree of consistency across all metadata schemes
that supports interoperability. For example, most schemes provide for a title field,
date field, and identifier field.
There is usually direct relationship between the cost of metadata creation and the
benefit to the user. Applying standard subject vocabularies and classification
schemes is more expensive than assigning a few keywords, and so on.
6.5 Preservation
Preservation is the aspect of archival management that preserves the content as well as
the look and feel of the digital object. There is no common agreement regarding the
preservation of digital objects in terms of time frame. However, it is estimated that the
cycle for hardware/software migration is at 2 to 10 years.
6.5.1 Hardware and Software Migration
New releases of databases, spreadsheets, and word processors can be expected at least
every 6 months to three years, with patches, and minor updates release more often. While
software vendors provide backward compatibility for some versions but this will not be
applicable after changes in 2 to 3 versions. This problem is serious if there is closures,
sell outs or mergers of many firms dealing in the computer hardware, software and
peripherals.
The best practice for the foreseeable future will be migration to new hardware and
software platforms, emulation will begin to be used if and when the hardware and
software industries begin to endorse it.
6.5.2 Preservation of the Look and Feel
Several approaches are being used to the “look and feel” of material. For journal articles,
the majority of the projects reviewed use image files (TIFF), PDF, or HTML. TIFF is the
most prevalent for those organizations that are involved in any way with the conversion
of paper backfiles. The OCR technology is only 95% accurate, is used only for searching,
the TIFF image is the actual delivery format that the user sees. However, this does not
allow the embedded references to be active hyperlinks. HTML is the another popular
format used for archiving documents.
For purely electronic documents, PDF is the most prevalent format. This provides the
replica of the Postscript format of the document, but relies upon proprietary encoding
technologies. In PDF format, if the document is put in Internet, it will consume more
bandwidth as compared to HTML format, however in HTML format the tables and
pictures conversion can not be done as replica of the original document.
6.6 Access
The e-documents contained in a digital library may be accessed through the search and
retrieval software. Searches may be of multiple types:
1. Structured or metadata driven, in which case the software runs through the metadata
elements and retrieves the documents based on the analysis of contents, done by the
developer of the metadata record. This can be compared with library catalog and the
efficiency of retrieval activity depends on cataloger’s work.
2. Object searches are based on text-tagging and indexing work done either manually or
by any indexing software. This capability allows full-text, multimedia and object
searches.
3. Global search and resource type search e.g. e-journals and reports.
4. Structured presentation is the display of the target document retrieved through the
search or from a direct link.
6.6.1 Access Management
In traditional libraries, access was possible only to the owner who preserved the
documents. For materials stored digitally, users operate on exact images of the original
works stored in their local computers. Separating usage from the original in this way,
digital technology affords multiple, simultaneous uses.
Management access to digital contents, however, is one of the most difficult access issues
for digital archiving. What rights does the archive have? What rights do various user
groups have? What rights has the owner retained? How will the access mechanism
interact with the archive’s metadata to ensure that these rights are managed properly?
Rights management includes providing or restricting access appropriate, and changing
the access rights and security level changes.
Access rights takes a sizeable amount of work, and is highly technical nature. Being an
on-going task, it adds extra task in the library activity involving User ID and Password
management, Proxy/IP authentication management, unauthorised access management,
home based users, etc. All these activities need full time computer expert on the library
staff, is increasing more and more in the libraries.
7 Narrowing the Technological Gap
Adapting quickly to new technology is vital for success. But technological adaptation is one
thing and growth is another. The growth has to come from within. In order to promote
continuous upgrading of digital libraries, each country should set up a coordinating committee to
interact with various agencies, especially computer and telecommunication people. Nepal, in
particular, where technologies of all ages are operating, needs a complete transformation in
information policy to pave the way to the digital library era. (Varis, 1998)
8 Conclusions
The above discussion has covered all the aspects of developing digital libraries from creation to
Access management and the challenges to be faced by librarians.
Standards for creating digital objects and metadata address archiving issues. Acquisition can be
done manually or through automated gathering software, and guidelines are available to support
the questions related to selection. Standards for cataloguing and unique identification are
important in order to make material know to the archive administration. Issues of preservation
are closely linked to the continuous development of new technologies. Current practice is to
migrate from one storage medium, hardware configuration and software format to the next.
Access mechanisms, being hardware and software based has their own migration issues. In
addition there are concerns about access rights management, security and version control.
Adopting technology and growth are two different things. Growth needs to come from inside.
The above process demands rich knowledge and experience in traditional librarianship as well
as requires learning of IT skills. This widens the existing gap of IT knowledge and training in
Librarianship in Nepal. This needs support from government in Nepal and librarians to fill the
existing gap.
References
American Digital Library Federation, 1998. A working definition of digital library. Available
at http://www.clir.org/diglib/didefinition.htm. (Accessed 10 Dec, 2003)
Falk H. 2003. Developing digital libraries. The Electronic Library 21(3): 258-261.
Hodge GM. 2000. Best practices for digital archiving: an information life cycle approach.
D-Lib Magazine 6(1): 1-16.
Jaswal BA. 2003. Developing digital libraries: concepts, procedures, technologies. Pakistan
Library Bulletin 34(1): 1-11.
Sharma RK and Vishwanathan KR. 2001. Digital libraries: development and challenges.
Library Review 50(1): 1.
Varis T. 1998. Communications in the future UN system. In: The Future of the United
Nations System: Potential for the Twenty-first Century, edited by Alger CF. Tokyo: UNU Press,
pp. 379-81.
Download