Introduction to Digital Libraries Digital Libraries Defined What is a Library? Main Entry: li·brary Pronunciation: 'lI-"brer-E; 1) a place in which literary, musical, artistic, or reference materials (as books, manuscripts, recordings, or films) are kept for use but not for sale 2) a series of related books issued by a publisher 3) a collection of publications on the same subject 4) material of a particular organism or tissue What is Library Collection of books, documents, newspapers, audio visual materials kept and organized for people to read or borrow. Characteristics 1.Collection of data objects 2.Collection of Metadata Structures 3.Collection of Services 4.Domain Focus 5.Quality Control 6.Preservation A History of Libraries Lyceum - Ancient Greece – http://en.wikipedia.org/wiki/Lyceum Alexandria - Ancient Egypt – http://en.wikipedia.org/wiki/Library_of_Alexandria Boston Public Library - First US public lending library (1848) – http://www.bpl.org/ – “The commonwealth requires the education of the people as the safeguard of order and liberty” the memex The memex was a proposed desktop machine that would store millions of books in microfilm. It would have a mechanism that would allow any known item from the collection rapidly. But the problem is what items to look at? memex Vannevar Bush’s vision – How far have we come? – What did you notice about this article -- style or content or background or anything else. – Did the article suggest anything you would not want to see happen? Image source: kelty.rice.edu/375/images/memex/camera.jpg http://www.knowledgesearch.org/presentations/etcon/images/memex.gif digital libraries • Generally, we can think about digital libraries are – information stored on a computer – delivered via a network – mimics existing libraries Digital Library access multimodal data image available available texts …… …… video available available audio the semantic web The semantic web is the actual successor to Lick’s vision. It’s still not done. He also had too optimistic a vision about AI. query example relevance feedback Searching images negative example positive example ICUDL06, YT Zhuang What is a Digital Library (DL)? “…a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network” – there are any number of alternate definitions, but this seems fair enough DL Definition - According to Gladney H.M, et. al. (1994) “A digital library service is an assemblage of digital computing, storage, and communications machinery together with the software needed to reproduce, emulate, and extend the services provided by conventional libraries based on paper and other material means of collecting, storing, cataloguing, finding, and disseminating information.” DL Definition (1) – Paul Duguid (1997) has defined the Digital Library as an environment to bring together in support of life cycle of information in addition to digital collection and information management tools. The concept of a "digital library" is not merely equivalent to a digitized collection with information management tools. It is rather an environment to bring together collections, services, and people in support of the full life cycle of creation, dissemination, use, and preservation of data, information, and knowledge. (Duguid, Paul, 1997). DL Definition (2) – The Internet is the digital library. • Many different groups to signify simply a collection of digital objects that people can access from their desktops have appropriated the word “library”. • But is this a "digital library"? • For many common library requests, locating information on the Internet remains highly inefficient compared to traditional library sources and Finding information is difficult. DL Definition (3) – Digital libraries will be cheaper than print libraries. • A common assumption among technology reporters about the costs of "digital libraries" is that digital is cheaper than paper. • It is no surprise Digital Content providers are resorting to Contract Agreements and Licensing Mechanisms instead of normal copyright provisions. Definitions Digital Library Collection of electronic resources that provides direct/indirect access to a systematically organized collection of digital objects. Hybrid Library Provides services in a mixed-mode, electronic and paper, environment, particularly in a co-coordinated way. Derived from a strand of eLib which explored the issues surrounding the retrieval and delivery of information in these types of environment but also investigated the integration of different electronic services so that single search approach could be offered to the End user. Virtual Library Access to electronic information in a variety of remote locations through a local online catalogue or other gateway, such as the internet Advantages: Why digital libraries? Access: brings library to users – always available; better and wider delivery Sharing: information resources; linking Timeliness: easier to keep current Searching, browsing: use of computer power Information resources: new forms possible Services: new & new forms possible Costs: may save effort, money?? benefits: availability Digital libraries bring the information closer to the user than physical libraries can – physically – temporarily Even when you are in the physical library you still get faster access to digital library items. benefits: findability Information can be more easily found in digital than in print. Some non-textual information is still only findable via metadata. But computer scientists are working on that. benefits: sharing Information can be shared. Items can not be damaged. Items can not be stolen. benefits: updating Information can be kept up-to-date more easily. To update a book, you have to reprint all copies, and replace them. benefits: new media Information can be created and manipulated in completely new ways. For example location information can be mixed up with subject information. issue: costs The cost of storing print information is very high. It is a multiple of acquisition costs. Digital storage devices decline in price. But digital information manipulation requires skills that are not easy to procure. The overall cost comparison is difficult to assess. The Study of Digital Libraries is Multidisciplinary computer science – tools, protocols, transport information science – models of information access and storage human factors – usability, adaptability law – rights management economics – “it’s all about using…” Problems for libraries Integration between print and digital – mixing new digital technology with print, local with global; managing diverse resources - all difficult – economic trade-off decisions; new economic relations Competition for scarce resources sharpening Institutional, cultural & social adjustments not easy Bridging the digital divide Resistance, threats: – guerilla warfare within and nuclear annihilation without drawbacks: monopoly dangers Since the information only needs to be kept in one copy, and others can access it, there are inherent dangers of the build-up of monopolies. One example is Google search engine. digital information was hard to use Computers had to be driven by esoteric commands. Screens were hard to read from. Telephone lines where hard to get to work to transmit information Access costs to digital information was high. The service aspect was important. Economic issues Costs not insignificant - WHO PAYS? – Two traditions: old - users, new (“free”) - providers Dilemma in library budgets – licensing of digital publications vs. subscriptions Publishers’ economics for digital publications – approaches vary, not settled, even scared – even: who is a publisher? - lines blurring Economics of digital libraries still up in the air – room for research & experimentation Social issues Legal issues: copyright protection, security Individual: privacy protection; rights; obligations – role in information exchanges, work, needs; life ... Organizations: integration; changing structure Traditional libraries: disappearing? changing? Education: impact on all levels; integration Computing & society: disparity between information rich & poor; digital divide; equity Example: The Internet Archive Example: National Library of Australia Example: National Library of Sweden Egyptian Universities Libraries Egyptian Universities Libraries Egyptian Libraries building aspect Building a digital library can basically take three for – electronic resource management – repository building – cross-repository services Types of Digital Libraries 1. Stand-alone Digital Library (SDL) 2. Federated Digital Library (FDL) 3. Harvested Digital Library (HDL) Stand-alone Digital Library (SDL) This is the regular classical library implemented in a fully computerized fashion. SDL is simply a library in which the holdings are digital (scanned or digitized). The SDL is selfcontained - the material is localized and centralized. The ACM Digital Library IEEE Computer Society DL Federated Digital Library (FDL) This is a federation of several independent SDLs in the network, organized around a common theme, and coupled together on the network. A FDL composes several autonomous SDLs that form a networked library with a transparent user interface. The different SDLs are heterogeneous and are connected via communication networks. Networked Digital Library of Theses & Dissertation Bibliographic Navigation Tools for Digital Libraries SCOPUS ELIN Knowledge Cite Library Database Advisor OCLS’ FirstSearch Harvested Digital Library (HDL) This is a virtual library providing summarized access to related material scattered over the network. . Examples of HDLs are the Internet Public Library (IPL) 1. A HDL holds only metadata with pointers to the holdings that are "one click away" in Cyberspace. 2. Developed by Library Professionals, or Computer Scientists Four Corner Stones of Digital Library Community Computer Communication technologies Content Contents Images .BMP .TIF .GIF .PNG .WMF .PICT .PCD .EPS .EMF .CGM .TGA .JPG Animation .ANI .FLI .FLC Video .AVI .MOV .MPG .QT Contents Audio .WAV .MID .SND .AUD .mp3 Web Page .HTM .HTML .DHTML .HTMLS .XML Text .DOC .TXT .RTF .PDF Programs .COM .EXE Contents Markup standards 1. Hypertext Markup Language (HTML); http://www.w3.org/MarkUp/ 2. Extensible Markup Language (XML); http://www.w3.org/XML/ 3. Standard Generalized Markup Language (SGML); http://www.w3.org/MarkUp/SGML/ Contents Metadata standards 1. Dublin Core; http://dublincore.org/ 2. MARC 21; http://Icweb.loc.gov/marc/ 3. Encoded Archival Description (EAD); http://Icweb.loc.gov/ead/ How is a DL Different from a Traditional Library? TL has as its focus physical objects – even if the card catalog (metadata) is electronic, the purpose is to point you to a physical location – trafficking in physical objects has both obvious and subtle implications • object can exist only in 1 place • if you have it, I can’t have it (zero-sum distribution) • I have to go to the object, or wait for it to come to me TLs vs. DLs DLs clearly better than TLs at: – Dissemination, storing information variety However, TL objects are more survivable – Who will archive the research information? • the publishers? • the institutions? • the authors? – Will the average DL object still be accessible in 10 years? QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. How is a DL Different from a Traditional Library? • Digital Library • removing the physical restriction has obvious benefits • multiple access, multiple listings, electronic transmission • also complicates many other issues... • intellectual property, terms and conditions, etc. • Note that a TL offers additional social and educational benefits • Most TLs also offer hybrid services too. TLs vs. DLs Where does publishing stop, and libraries begin? – there has always been tensions between TLs and traditional publishers, but the roles were fairly well defined – DLs can muddle the separation of these responsibilities • result: conflict, and/or new models Traditional Players book store publisher library archive responsibility over time How is a DL different from a database? A traditional SQL database has its basic element data items in a relation: o select name o from employee, project o where employee.deptnumber = “25” AND project.number = “100” databases exploit known structures and relations How is a DL different from the WWW? The keyword is managed – The WWW is not managed Some meta searchers (Yahoo, Lycos, Google) attempt to add an organizational framework to their web holdings – However, most are focused on keyword searching (i.e., Google) How is a DL different from the WWW? Another key difference is who controls the input into the system – most meta searchers hunt down their holdings – some (DMOZ, Yahoo) have humans in the loop for review and classification DLs are generally more tightly controlled, and have a targeted customer set