DIGITAL LIBRARIES: AN OVERVIEW Dr. I.R.N. Goudar Head, ICAST National Aerospace Laboratories Bangalore – 560017 goudar@css.nal.res.in One day Seminar on Digital Library Services for Technical Colleges Basaveshwar Engineering College Bagalkot – 587102 15 April 2006 Traditional Libraries Libraries with the same purpose, functions, and goals Collection development and management Technical Processing Index creation Counter Transactions Reference work Preservation What is Digital library ? A Service? An Architecture? A set of Information Resources? A set of tools to locate, search, retrieve information? Possibly the tools to create such resources and services also fall within the purview of DLs Digital face of traditional libraries Include both digital collections and traditional Backbone and nervous system of libraries. Defining the Digital Library “A digital library service is an assemblage of digital computing, storage, and communications machinery together with the software needed to reproduce, emulate, and extend the services provided by conventional libraries based on paper and other material means of collecting, storing, cataloguing, finding, and disseminating information.” (Gladney H.M, et. al. 1994) “Digital Libraries are a set of electronic resources and associated technical capabilities for creating, searching,and using information…they are an extension and enhancements of information storage and retrieval systems that manipulate digital data in any medium (text, images, sounds,static or dynamic images) and exist in distributed networks” (Borgman, 1996) What is Digital library ? Borgman identifies two major aspects DL researchers from Computer Science focus on content for user communities and therefore emphasize the enabling technologies Library professionals appear to emphasize DLs as services However require both the skills of librarians as well as those of computer scientists What is important? Site Neutrality Access-Anytime (24*7) Anywhere (Office, Residence, Travel) By Anyone Open Access and Sharing of information Greater variety and granularity of information Up-to-date ness New forms of rendering ( New Genre) Integration of digital media into traditional collections Digital libraries are different in that they are designed to support the creation, maintenance, management, access to, and preservation of digital content Five Elements in Various Definitions of DL The digital library is not a single entity; The digital library requires technology to link the resources of many; The linkages between the many digital libraries and information services are transparent to the end users; Universal access to digital libraries and information services is a goal; Digital library collections are not limited to document surrogates: they extend to digital artefacts that cannot be represented or distributed in printed formats. Association of Research Libraries (1995) Goals of DL Focused on digitization technology, metadata schemes, data management techniques, and digital preservation. Second-generation digital library exploring new opportunities and developing new competencies. Third-generation digital library focusing instead on fully integrating digital material into the library’s collections through a modular systems architecture. Digital Libraries Shorten the Chain Reviewer AUTHOR Editor Publisher A&I DIGITAL LIBRARY USER Consolidator Library ROLES READER AUTHOR LIBRARIAN EDITOR LEARNER TEACHER Ingredients for DLs Hardware The minimum machinery to do the job Software The programs for handling data Digital Objects Articles, Conference Papers, Thesis,…… Basic Skills Things one has to learn Hardware A Server You’ll need access to a web server A good PC Scanners Flatbed – Auto feed, Back to back MF Book Scanner Software Open Source Software (OSS) Dspace, E-Prints, Fedora, GSDL…… Proprietary software you can’t avoid Image Editing and Optical Character Recognition Software have to be purchased Hardware- Software Network High-speed local networks and fast connections to the Internet Relational databases that support a variety of digital formats Full text search engines to index and provide access to resources Web servers and FTP servers (both intranet and internet) Electronic document management functions Digital Library Content Content Types Text Documents Articles Reports Books Manuscripts News Papers Theses Tech. Reports Video Audio Speech Music Movies Geographic Information (Aerial) Photos Software, Programs Genome Human Animal Plant Bio Information Images and Graphics Photographs Models Simulations Paintings 2D 3D Content is King The information content is more important than the systems used for its storage, management and retrieval Objects should not be “locked” in specific DLs or archives Types of Digital Collections Digitization Converting paper and other media in existing collections to digital form Acquisition of original digital works Created by publishers and scholars like electronic books, journals, and datasets Access to external materials Like Web sites, other library collections, or publishers' servers Resources Bibliographic databases that point to both paper and digital materials Indexes and finding tools Collections of pointers to Internet resources Directories Teaching and pedagogic materials Photographs Numerical data sets E-books and e-journals Creating DLs … Six steps Selecting Acquiring Digitization Organizing Archiving Providing Access Process Selection of Books Identification of Books Meta Data Scanning Process Scanning Image Processing & QC OCR Publishing Digitization “Conversion of any fixed or analogue media--such as books, journal articles, photos, paintings, microforms--into electronic form through scanning, sampling, or in fact even re-keying.” Digitization Process …. Determine copyright or restrictions Digital conversion Outsource or in house? Text conversion, formats, headers, compression, and delivery media Digital capture with camera or scanner ? File handling File naming Digitization Process Preparing the objects Scanning Moving files to temporary storage Value addition- metadata preparation etc Long term storage Derivative image/thumb nail for access copy Merging files Digital Production Process Data Workflow Content Project Quality ManagementManagement Management Management Management Supplier Management Data Management Data Management Workflow Content Management Management Supplier Management • Formats: TEX, PDF,PS • Metadata and content data • Structuring (Tagging) • Media neutrality Project Management Quality Management Workflow Management Data Management Workflow Content Management Management Supplier Management • Processing • Conversion • Automatization • Interfaces - input / output Project Management Quality Management Content Management Data Management Workflow Content Management Management Supplier Management • Style files • Information /Object models • Archiving Project Management Quality Management Quality Management Data Management Workflow Content Management Management Supplier Management • Data consistency • Process consistency • Content consistency Project Management Quality Management Various Workflows Input Processing RTF TeX Camera ready Output Books Normalization Content Processing Archive Journals Software OCR: Optical Character Recognition On the market are many good OCR programs, with prices ranging from Rs. 5000 to Rs.20,000. For example, among many others are: Read-Iris (http://www.readiris.com/) Omnipage (http://www.omnipage.com/) Fine-Reader (http://www.finereader.com/) Possible Delivery Formats Pure image formats: TIFF, JPEG Open encoded formats: XML, HTML, ASCII, and Unicode Hybrid formats: PDF, DjVu – can contain both image and text Proprietary formats: Microsoft Word, WordPerfect Good Principles What to digitize? Selection and policy is important Collection description is important such as scope, format, restrictions on access, ownership etc Digitization: Issues Copyright Access copy and archive copy File size Storage media( CD, Hard disc…) File format ( TIFF,JPEG…) Challenges in Publishing Preservation of layout Searchability of content and metadata Efficient image compression Easy browsing of books Accommodating low bandwidth user Multilingual text support Multipaging Digitization .. Factors Collection strengths Unique collections what is reasonable for any one institution to collect or digitize Technical architecture Like demands of a curriculum Manageable portions of collections only copies of something Priorities of user communities digitizing selected portions adding new digital works also be factor in selecting who digitizes what Skills of staff whose staff don't have the necessary skills Retrospective Conversion Complete conversion would be impractical or impossible technically, legally, and economically Digitization of a particular special collection or a portion of one which is highly valued Highlight a diverse collection High-use materials Approaches can be objects used alone or in combination depending upon a particular institution's goals Criteria for Selecting Content Their potential for long-term use Their intellectual or cultural value Whether they provide greater access than possible with original materials (e.g., fragile, rare materials) Whether copyright restrictions or licensing will permit conversion. Metadata The data that describes the content and attributes of any particular item Key to resource discovery and use of any document Facilitate searching and discovery, as well as administrative and structural metadata to assist in object viewing,management, and preservation. Elements of Dublin Core Title Creator Subject and Keywords Description Publisher Contributor Date Format Resource Identifier Resource Type Source Language Relation Coverage Rights Management Barriers Digital objects are less fixed, easily copied, and remotely accessible by multiple users simultaneously Libraries mostly are simply caretakers of information, own the copyright of the material with restrictions To develop mechanisms for managing copyright, mechanisms that allow them to provide information without violating copyright, called rights management Rights Management Usage tracking Identifying and authenticating users Providing the copyright status of each digital object, and the restrictions on its use or the fees associated with it Handling transactions with users by allowing only so many copies to be accessed, or by charging them for a copy, or by passing the request on to a publisher Preservation Keeping digital information available in perpetuity Real issue is technical obsolescence Like the deterioration of paper in the paper age Constantly coming up with new technical solutions Three Types of Preservation Preservation of the storage medium Preservation of access to content Preservation of fixed-media materials through digital technology Preservation of the Storage Medium Tapes, hard drives, and floppy discs have a very short life span Obsolete anywhere from two to five years before they are replaced by better technology Possibility of non-availability of the hardware or software to read them May have to keep moving digital information from storage medium to storage medium Preservation of Access Access to the content of documents, regardless of their format: - When the formats (e.g., Adobe Acrobat PDF) containing the information become obsolete - Translate data from one format to another for preserving the ability of users to retrieve and display the information content - Data migration is costly Still no standards for data migration Distortion or information loss Fixed-media through Digital Technology Replacement for current preservation media such as microforms No common standards for the use of digital media as a preservation medium It is unclear whether digital media are going handle the task of long-term preservation Digital Libraries Benefits : Individual Gain access to the holdings of libraries worldwide through automated catalogs. Locate both physical and digitized versions of scholarly articles and books. Optimize searches, simultaneously search the Internet, commercial databases, and library collections. Save search results and conduct additional processing to narrow or qualify results. From search results, click through to access the digitized content or locate additional items of interest. All of these capabilities are available from the desktop or other Web-enabled device such as a personal digital assistant or cellular telephone. Digital Libraries Benefits : Classroom Projects Capability to enhance the classroom experience or conduct learning apart from a physical campus Digital library is a core component of this VLE Changing the relationships between the library and other parts of the academic enterprise Integrate authoring, analysis, and distribution tools that facilitate the reuse and repurposing of digital content Collections and services can be integrated into the institutional, national, and worldwide Digital Library Standards Common User Interface: Data Handling and Interchange: Graphic Formats – JPEG, TIFF, GIF, PNG, Group 4 Fax, CGM Structured Documents – SGML, HTML, XML Moving Pictures/3-D – MPEG, AVI, GIF89A, QuickTime, Real Video, ViviActive, VRML Metadata: Resource Description – Dublin Core, WHOIS++ Templates, US-MARC, TEI Headers, Other Open Source and Domain Specific Standards. Resource Identification – URN, PURL, DOI, SICI Security, Authentication and payment services: Emerging e-Commerce Standards. Indian DL Initiatives: Contents Books (out of copyright) Scholarly Journals Theses Institutional E-Prints Manuscripts Data News Papers Metadata Level Portal and Gateway Services Government, Judicial, Financial, Land Records Ministry websites include policy and planning documents, annual reports, budget etc. Goa, Andhra Pradesh, Karnataka, Maharashtra, Tamil Nadu have made significant headway Judgments of Supreme Court and High Courts covered Digital Library of India at IISc, Bangalore • Mission: Free access to human knowledge through Portal • Objectives: To capture all books in digital format (1 m by 2005) Test bed for improved Scanning Techniques,OCR, Indexing Books, Journals, Palm Leaves > 1L books in English, Telugu, Kannada, Tamil, Sanskrit, Urdu 100 Scanners in 16 scanning centres Plan for 1 m documents by 2005 Science, arts, culture, music, movies, traditional medicine Will be mirrored at Several location in the world Collaboration: Universal Library Project, CMU http://www.dli.ernet.in/ IISc: Other Activities Vigyan: Website on Indian S & T Collaboration with NISSAT/DSIR Indo-French Cyber University Initially PG in Applied Mathematics E-prints at IISC by NCSI (http://eprints.iisc.ernet.in/) Online digital repository of IISc research papers Research papers (preprints, post-prints), book chapters, tech reports, unpublished findings, conf papers, magazine articles Set up using e-prints.org open source software Part of worldwide institutional e-print archives Institutional Repositories Indian Institute of Science National Aerospace Laboratories National Chemical Laboratories National Institute of Oceanography ISI – Mathematics DRTC- LDL Raman Research Institute IIM Kozikode Scholarly Science Journals Indian Academy of Sciences (IAS) –11 Journals Indian National Science Academy – 4 journals Indian Medlars Centre (IndMed) – 22 journals Vidyanidhi: Dept. Of LIS, Univ. of Mysore Digital Library and E-Scholarship Portal Indian Theses Database Indian ETD Collection Training Program for improving quality Supported by DSIR, GOI Part of global ETD initiative Support by Ford Foundation and Microsoft http://www.vidyanidhi.org.in/ Theses initiatives by others: IITs Delhi and IIT, Mumbai Indira Gandhi National Centre for the Arts Digital Images Electronic Books Video Recordings Papers and Essays Audio Recordings Research Reports Databases News Letters Bibliographies Conference Proceedings Multimedia Documentation Manuscripts in India Kalakalpa (Journal) In house Articles Mumbai Asiatic Society • Rare Books (as back as 1632) • Manuscripts (Sanskrit, Pali, Tibetan, Prakrit, Arabic, Persian,etc.) • Maps, Coins • Buddhist Relics • Book preservation laboratories Microfilming – Now digitisation http://education.vsnl.com/asbl/treasure.html National Library, Calcutta Manuscripts (Work in Progress) Paper –3000, Palm Leaf-334 Books (<1900, Indian <1920) 6600 Titles, 2.5 M pages, 548 CDs Bengali Journal (Prabasi) East India Company Records Many Diaries Orunudoi Assamese In Bengali Journal Archives of Indian Labour V.V. Giri National Labour Institute Heritage of Indian Working Class Commissions on Labour Oral History Collections Trade Union Collections Regional Collections Strike Collections http://www.indialabourarchives.org/ India: DL Issues Objects Identifications: Non-availability, Coordinated efforts Technology and Infrastructure Standards: Meta data Funds Networking of minds Multiple languages Inhibition and Reservations (libraries and heritage materials) IPR DLI in India: Suggestions Distributed National Network with Global Access Institutional digital repositories Open access science journals Content based catalogue (metadata) Portal giving links to various activities Intensive Training of Librarians, Archeologists, Curators, etc. Improve Technology and Infrastructure Adopt Suitable Standards Language Tools Modify IPR to suit Open Archive Enhancement Capability of Integrated Lib Auto System to handle DL Features Compilation of Directory of DL Technologies and Vendors Bibliography on Digital Libraries http://sunsite.berkeley.edu/C urrentCites/bibondemand.cgi? query=digital+library