Chapter Nine Visions Future, past, and present How to Build a Digital Library Ian H. Witten and David Bainbridge Visions: Future, Past, and Present Digital libraries have practical advantages over physical ones Digital libraries offer the promise of far greater universality Mission of a Library The mission of a library is twofold: To collect, organize, and provide access to information To pass it down to succeeding generations as a record of culture The Librarian’s Duty The librarian has twin duties: Access to the world’s literature for today’s readers Preservation for future generations Challenges for Digital Libraries Today’s collections are mostly text The real challenge is to create collections of digital documents in diverse media types Examples: Music libraries that can be searched by humming Libraries of the future Libraries of the Future Digital libraries Have the potential to be far more flexible than conventional ones Will be large Will not be static Today’s Visions Impersonal and utilitarian Example: Figure 9.2 Real people in real environment Example: Figure 9.3 Kataayi cooperative in Uganda Low tech Today’s Visions Libraries are about connecting people with the information they need Tomorrow’s Visions Sci-fi image Personalized space A kitchen for knowledge preparation Workshop emphasis of preservation over access Comfortable, personalized, dynamic, up to date Your Visions? Librarianship Librarianship: Selection, organization, and maintenance Wisdom and value judgments What information to include How to organize the information Working Inside the Digital Library Digital Library A library without walls but with boundaries Working inside the digital library: An environment that surrounds in an intellectual sense More or less immersive Reacts and responds Preserving the past The Problem of Preservation Technological progress comes at the expense of preservation The Problem of Preservation Paper Film Acid-based paper decomposes after only a few decades Film containing nitrate decays quickly Analog audio Wax cylinders or magnetic tapes must be preserved by transferring onto digital formats The Problem of Preservation A process of regular copying can be established to preserve digital material without loss The Digital Dark Ages “No one understands how to archive digital documents” Preservation Technology Enormous amounts of digital information are already lost forever Information technologies become obsolete very quickly Document and media formats continue to proliferate Technology standards will not solve fundamental issues in the preservation of digital information Availability of Material Libraries will shortly see a demographic bulge of electronic material as the baby boom generation of authors and academics contribute material gathered during their careers Much material will never make it into library collections for preservation because of increasingly restrictive intellectual property and licensing regimes Archiving and preservation functions in a digital environment will increasingly become privatized as information continues to be commodified Traditional Library Functions Financial resources available to libraries and archives continue to decrease Libraries and archives will be required to continue their existing archival and preservation practices as the current paper publishing boom continues Preservation Strategies Digital documents are vulnerable to loss because the media on which they are stored decays and becomes obsolete They become inaccessible when the software or hardware becomes obsolete Preservation Strategies Digital formats have advantages over analog formats Digital formats seem to promote preservation The advantages make digital preservation even harder Preservation Strategies Ease of creation causes information glut Easy of copying makes “copies” seem dispensable Improvements in hardware and software promote obsolescence Preservation Strategies “May all your problems be technical ones” Computer people recognize that the technical problems can be solved It’s the human part that causes problems Administrative and political processes take time and cause frustration Technical problems have solutions which yield to honest intellectual work Preservation Strategies Preservation is not a technical problem Preservation Strategies Four Preservation Strategies Paper Museums Emulation Migration Preservation Strategies Paper and Museums Involves printing the material on paper or microfilm and storing in museums Not considered a long-term preservation strategy Emulation and Migration Involves preserving the physical stream of bits and/or the logical means by which the bits are interpreted as a document Preservation Strategies Emulation Keeping the documents in exactly the same form Emulate the functionality of the original, obsolete system on future, unknown systems Preservation Strategies Preserving the physical bit stream Regular copying to new media Error detection to determine if degradation is occurring Error correcting codes to ensure new generations are faithful copies of the original Preservation Strategies Preserving the logical interpretation Emulate old interpreters on new hardware Backward compatibility Preservation Strategies An important feature of any format used for preserving documents it that it is open: the details are made publicly available It must be open in principle as well as practice Documented well enough for others to understand and build their own interpreters Examples: PostScript and PDF Preservation Strategies Migration Translating the document from the old format to a format accepted by new software Designed for near-obsolete software Involves copying the physical bit stream to new media Involves translation to a new logical format Preservation Strategies Emulation or Migration? Migration may be cheaper No special emulation software needs written Conversion software is usually available Conversion is a kind of translation May lose features of the data Generalized documents: A challenge for the present Generalized Documents: A Challenge for the Present Text remains the principal means for searching and browsing collections, even when they contain documents in other media Multimedia documents can be displayed Linked to text documents Text may contain only captions Text is browsed and searched Digital Libraries of Music Music information retrieval Motifs in music are analogous to key phrases in text OMR Optical music recognition Music analog of OCR Other Media Images Videos Objects Other Document Types Images Thumbnails Visual material can be rapidly browsed using thumbnails Captures the readers attention Gives a feeling for what the collection is about Difficult to automatically search images rather than manually browse them Videos Video Cut detection a sequence of pictures? Locating techniques where the scene changes Movies Browsed and manipulated using thumbnails Each thumbnail represents a typical image or the initial image in a scene Objects Realia Real artifacts Computer graphics allow three-dimensional objects to be captured in the form of a data set Artifacts In libraries and museums, artifacts are indexed and located on the basis of metadata Books Can be modeled as physical objects Other Document Types Teaching material Research material Multimedia elements Laboratory notebooks Scientific and engineering data Results of experiments, simulations, and surveys Information is expressed in many forms Generalized Documents in Greenstone Digital Library Focused collection of digital objects, including text, video, and audio The Challenge Integrate objects of all kinds of media into digital libraries in such a way that each becomes a first-class citizen Generalized Documents in Greenstone Greenstone does not incorporate searching and browsing techniques for non-textual media Generalized Documents in Greenstone Solutions to current Greenstone limitations: New modules can be added New search engine can be deployed by replacing or augmenting the MG system that does text searching Browsing horizontal and vertical lists can be handled by adding a new classifier New browsers can be added through Perl code New media types can be imported by adding new plug-ins Digital Libraries for Oral Cultures Libraries are about literature Literature: The writings of a society, in prose or verse Broadly speaking, literature includes all types of fiction and nonfiction writing intended for publication Digital Libraries for Oral Cultures It should be possible to create digital library collections intended for people in oral cultures Useful for people who may be illiterate or semiliterate Useful for people who cannot speak or read the language of the digital library Digital Libraries for Oral Cultures Iconic Form Serious practical information can be conveyed in a purely iconic form Examples How to splint a broken forearm User manual for underground transport system Historical precedent of Beggar’s Bibles Digital Libraries for Oral Cultures Libraries for the illiterate We are all illiterate with respect to some other languages and cultures Media types: Static images Motion, sound, video, interaction, 3D objects, simulations, virtual reality