Digital Libraries: Extending and Applying Library and Information Science and Technology CIKM 2000 November 9, 2000 Edward A. Fox fox@vt.edu http://fox.cs.vt.edu CS DLRL Internet TIC Virginia Tech, Blacksburg, VA, USA Acknowledgements (Selected) Mentors: JCR Licklider, Michael Kessler, Gerard Salton Sponsors: Adobe, IBM, Microsoft, NLM, NSF, OCLC, SOLINET, SURA, UNESCO, US Dept. of Ed. (FIPSE), … VT Faculty/Staff: Tony Atkins, Thomas Dunbar, Debra Dudley, John Eaton, Gwen Ewing, Peter Haggerty, Gary Hooper, Gail McMillan, Len Peters, James Powell, … VT Students: Emilio Arce, Fernando Das Neves, Brian DeVane, Robert France, Marcos Goncalves, Scott Guyer, Robert Hall, Neill Kipp, Paul Mather, Tim McGonigle, Todd Miller, Constantinos Phanouriou, William Schweiker, Ohm Sornil, Hussein Suleman, Patrick Van Metre, Laura Weiss, … Internet Technology Innovation Center Supported by Virginia’s Center for Innovative Technology Statewide University Partners - Governing Board: Christopher Newport University – William Winter, William Muir, Virginia Electronic Commerce Technology Center / Southeastern Virginia Network (VECTEC/SEVAnet) George Mason University – Scott Martin, Internet Multimedia Center (ICM) – Steven Ruth, International Center for Applied Studies in IT (ICASIT) University of Virginia – Alf Weaver, Internet Commerce Group (InterCom) – Jim French, Internet Digital Library Virginia Tech – Edward Fox, Digital Library Research Laboratory (DLRL), CC, CS – Scott Midkiff, Center for Wireless Telecomm. (CWT), VTISC, ECpE Digital Library Courseware http://ei.cs.vt.edu/~dlib/ WWW pages or large PDF copy files CourseInfo quizzes based on books by Michael Lesk (MKP.com) and William Arms (MIT Press) Contents based on books, with other popular topics added (e.g., agents) Separate pages to supplement: Definitions, Resources (People, Projects), and References JCDL 2001 First Joint ACM/IEEE Conference on Digital Libraries (+ NSF DLI-2 PI mtg) http://www.jcdl.org June 24-28, 2001 in Roanoke, VA Conference Committee: General Chair: Edward A. Fox, Virginia Tech Program Chair: Christine Borgman, UCLA Treasurer: Neil Rowe, Naval Postgraduate School Posters Chair: Craig Nevill-Manning, Rutgers U. Why this topic today? Many users (patrons) prefer digital libraries to traditional libraries or the Web Digital library collections often are free or less expensive, so are heavily used Most publishers are working toward digital libraries to allow access to their content Computing as well as library and information science professionals are key players in building digital libraries Outline Challenge – WHY ! Scaling / Technology Framework, Theory Simplification: DC, OAI Example Applications Grand Libraries of the Future JCR Licklider, 1965, MIT Press World Nation State City Community Licklider – Unified Theory? Not ready in 1960s Analog – unified field theory in physics “Mess” today – segmented field, specialities – Database <-> Knowledge <-> Content Mgmnt – Multimedia, Hypermedia, Hypertext – Logic, Algebra, Artificial Intelligence, … Expensive, annoying for users – Don’t know where to look – Don’t know how to use services Digital Library Content Content Types Text Documents Video Audio Geographic Information Software, Programs Bio Information Images and Graphics Articles, Reports, Books Speech, Music (Aerial) Photos Models Simulations Genome Human, animal, plant 2D, 3D, VR, CAT Communications (bandwidth, connectivity) Locating Digital Libraries in Computing and Communications Technology Space Digital Libraries technology trajectory: intellectual access to globally distributed information Computing (flops) Digital content less more (Slide from S. Griffin, NSF) Grand Challenges Can Mobilize the community Spur creativity Lead to important benefits in society Push researchers to develop relevant theories Force people to work in teams/groups Convince funding agencies to invest Help bring about integration of systems, interoperability, and seamless interfaces DL Challenges World Digital Library (Libraries) Preservation Scalability, - so people with trust DLs sustainability, interoperability (Supporting infrastructure - networks, …) DL industry - critical mass by covering libraries, archives, museums, corporate info, govt info, personal info - “quality WWW” integrating IR, HT, MM, ... – Need tools & methods to make them easier to build DLs: Why of Global Interest? National projects can preserve antiquities and heritage: cultural, historical, linguistic, scholarly Knowledge and information are essential to economic and technological growth, education DL - a domain for international collaboration – – – – wherein all can contribute and benefit which leverages investment in networking which provides useful content on Internet & WWW which will tie nations and peoples together more strongly and through deeper understanding Information Life Cycle Borgman et al.: Workshop Report on Social Aspects of Digital Libraries: http://www-lis.gseis. ucla.edu/DL/ Digital Libraries --- Objectives World Lit.: 24hr / 7day / from desktop Integrated “super” information systems: 5S: streams, structures, spaces, scenarios, societies Ubiquitous, Higher Quality, Lower Cost Education, Knowledge Sharing, Discovery Disintermediation -> Collaboration Universities Reclaim Property Interactive Courseware, Student Works Scalable, Sustainable, Usable, Useful DL-Related Timeline WWW 1985 1990 Scholarly EPub in U’s 1995 xxx CSTR 2000 OAI CoRR NCSTRL PDF SGML XML MPEG-7 JPEG, MPEG PCs TEI Proposed Ugrad DL HyperCard Hypertext Conf. ETDs DLI DLI2 NSDL Java DC NDLTD RDF Core of DL Collecting – Authoring, Repositories, Archives, Museums, … Organizing – Packaging of Data and Metadata, Storing – Naming/Identifying and Cataloging – Classification, Clustering, … Serving – Indexing, Linking, Summarizing, Visualizing – Browsing, Accessing, Searching, Filtering, Retrieving, Distributing, Using, … DL Components Gateways MM/ HT Renderer User Interfaces Workflow Mgr Search Engines, Classifiers, … DBMS Rights Mgr Data, MM Info Repository Digital Libraries Shorten the Chain from Editor Reviewer Publisher A&I Consolidator Library DL = Users Direct (Organized Artifact Mediated Communication) Author Teacher Digital Reader Learner Reviewer Editor Dr. Library Patient Librarian Benefits Ease of use Effectiveness “The benefits of digital libraries will not be appreciated unless they are easy to use effectively.” - IITA Workshop report Outline Grand Challenge Scaling / Technology Framework, Theory Simplification: DC, OAI Example Applications PetaPlex Top View 4 ft. side PetaPlex Side View 15 Roles: * Support * Cooling * Power shelves 8 ft. high 4 ft. wide Service Machine 1 Service PetaPlex Complex Service Machine 2 Nanoserver FRONT END MACHINE RS/6000, 1G RAM, 4 Proc. Machine 3 Service Machine 4 Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver PetaPlex Digital Library Machine (“super” object store): Parallel computer / storage utility Research: inverted files, video server, … Knowledge Systems Incorporated is supplying VT-PetaPlex-1 with 2.5 terabytes through 100 nodes: Net connection + 25GB disk + 233 MHz Pentium + Linux Structured Video Browser (making video into hypermedia) www.learn.umd.edu IBrowse Expository multimedia Narrative Structures MPEG- MPEG-7 Image Library Systems Tech. 7 Image Library Systems Users Web Search Engines WWW 1 5 Servlet Servlet 3’ Servlet 4 Servlet Engine 4’ MPEG-7 Description Module 2 5’ Web Server Search Server OS 3 DB and Communication ICU Information University MPEG7 MPEG-7 Video Library Systems Tech. Video Library Systems Tech. Architecture Video Data Description Generator Description Scheme Description Schemes Design Tool Player Video Database Retrieval Server Module Presentation Module Meta Database and Communication ICU Information University LMDS offers a LOT of bandwidth (comparison to previous auctions) LMDS MMDS DBS PCS A-C Block LMDS is: - 1300 MHz in two “Blocks” ( 28-31 GHz) - Over 2X bandwidth of AM/FM radio, VHF/UHF television, and Cellular telephone combined. - More than sum of previous 16 auctions Cellular Unserved Digital Audio Radio Service PCS D-F Block Wireless Communications Service Interactive & Video Data 0 200 400 600 MHz 800 1000 1200 SPIRE Visualization CAVE-ETD CAVE-ETD is a simulation of a library that runs in a CAVE (VR environment). Populated with a subset of ETD records. room room room room Main Foyer Reading Book Abstract Integrated CCLINC Translingual Information System DARPA CCLINC SERVER Translation It seems that North Korea launch a missile again After North Korea launched a Daipodong missile last month, NK is perceived to proceed to an additional test launch. Korea, US and Japan enter into an alert state, and prepare for a joint response policy. Korea estimates that the additional launch will be on 09/05. Japan estimates that NK’s missile range is short. US information says that there is no sign of launch yet. Outline Grand Challenge Scaling / Technology Framework, Theory Simplification: DC, OAI Example Applications Definitions Library ++ (library+archive+museum+…) Distributed information system + organization + effective interface User community + collection + services Digital objects, repositories, IPR management, handles, indexes, federated search, hyperbase, annotation Definition: Digital Libraries are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams) 5S Layers Societies Scenarios Spaces Structures Streams Definition: 5S Framework Societies: interacting people (, computers) Scenarios: services, functions, operations, methods Spaces: domains + constraints (e.g., distance, adjacency): 2D, vector, probability Structures: relations, trees, nodes and arcs Streams: sequences of items (text, audio, video, network traffic) (5 Element System: Fire, Wood, Earth, Metal, Water) 5S: Combinations Societies + Scenarios = user model Societies + Scenarios + Spaces = user interface Streams + Structures = markup Streams + Structures + Scenarios = object Structures + Scenarios = DBMS Outline Grand Challenge Scaling / Technology Framework, Theory Simplification: DC, OAI Example Applications Complex to Simple MARC ($50) Dublin Core (DC) Author‘s tools www.physik.uni-oldenburg.de/EPS/mmm DL Components Gateways MM/ HT Renderer User Interfaces Workflow Mgr Search Engines, Classifiers, … DBMS Data, MM Info Rights Mgr Repository Open Archives Initiative OAI www.openarchives.org openarchives@openarchives.org Original Open Archives Members American Physical Society California Digital Library Caltech Coalition for Networked Info. Cornell University Harvard University Library of Congress Los Alamos Nat’l Lab Mellon Foundation NASA Langley Research Cntr Old Dominion University Stanford University U. of Ghent U. of Surrey U. of Southampton Vanderbilt University Virginia Tech Washington University Approaches to Open Archives Build By Institution Build By Discipline Approaches to Open Archives Build By Institution Build By Discipline Author Category Interdisciplinary Year Language Query … OAi Philosophy Self-archiving = submission mechanism Long-term storage system = archive Open interface = harvesting mechanism Data provider + service provider Start with “gray literature” – e-prints/pre-prints, reports, dissertations, … Archive of Digital Objects Archive Access Protocol Handle (ID) terms and conditions Digital object OAI – Repository Perspective Required: Protocol MDO MDO MDO MDO MDO MDO MDO MDO DO DO DO DO OAI – Black Box Perspective OA 7 OA 4 OA 2 OA 1 OA 3 OA 6 OA 5 Black Box OAI-ETD Perspective … www.theses.org BN.PT (Portugal) SEALS (S.Africa) OhioLINK Dissert.Online (Germany) CBUC (Catalunya) CIC … VT CyberTheses (Francophone) NDC (Greece) MIT ISTEC (Ibero America) U. Bergen (Norway) ADT (Australia) PhysDis NSYSU (Taiwan) CS Teaching Center (CSTC) Collection of reviewed online resources used to aid in teaching of Computer Science Supports author submission and peer-review process for new ACM Journal of Educational Resources In Computing (JERIC) Connected with NSDL (NSF 00-44) http://www.cstc.org W3C Web Characterization Repository Online database of metadata related to publications, tools and data sets dealing with Web characterization Project of the Web Characterization Activity working group of the World-Wide-Web Consortium (www.w3c.org/WCA) http://purl.org/net/repository OAI Repository Explorer Serves as a compliancy test Allows browsing of open archives using only OAI protocol Sends requests on behalf of user, parses and checks responses and displays browsable interface Will detect most discrepancies in protocol http://purl.org/net/explorer Tiered Model of Interoperability Mediator services Metadata harvesting Document models Figure 1. Layers Related to Open Archives Initiative Services Citation / Linking Authoring Submission SFX Editorial: CiteSeer Reviewing, Certification Summarization Metadata Creation Registry Citation Checking Archives: Text/MM Editing Citation DB Updating Name, ID, Description, Terms and Conditions, … Authority Control Preservation Conversion Metadata Formats: Gazetteer Cataloging Copy-Edit / Add Value Name, Standard, Preservation Process, … Name, XML DTD, … Search/Browse Protocols Annotation Collaboration Archive Formats: … Services Tools … Repository Repository for NDLTD Metadata Formats: OA Metadata Set, NDLTD Standard (DC-based) Set Transaction Log Training Resources Open Archives Harvesting Protocol VT Partition Record (Metadata) Record (Full Content) NCSTRL Repository UVA Partition Metadata … Content … EconWPA Repository … Caltech Partition Metadata Content RePEc Repository Outline Grand Challenge Scaling / Technology Framework, Theory Simplification: DC, OAI Example Applications (6 slides from Lee Zia, NSF) Presidential Directive - 12/17/1999 Subject: Use of Information Technology to Improve Our Society “13. The Secretary of the Smithsonian Institution, the Director of the National Science Foundation, the Director of the National Park Service, and the Director of the Institute of Museum and Library Services shall work with the private sector and cultural and educational institutions across the country to create a Digital Library of Education to house this country's cultural and educational resources.” Programmatic History NSDL Program NSF: FY 00-02 DL Operational Fall, 2002 DLs & UG Earth Systems Education initiated FY 99, continuing DLI 2 Special Emphasis in Undergrad Education FY 98-99 DLI 2 - NSF, et al., initiated in FY 98, continuing Digital Libraries Initiative (DLI 1) - NSF/NASA/ARPA, FY 94-97 Vision A Learning Environments and Resources Network for SMET Education (LEARNS) Designed to meet the needs of learners, in both individual and collaborative settings Constructed to enable dynamic use of a broad array of materials for learning, primarily in digital format Managed actively to promote reliable anytime anywhere access to quality collections and services, available both within and without the network (from www.nsf.gov/nsdl) “The network is the library.” LEARNS Connects: Users: students, educators, life-long learners Content: structured learning materials; large realtime or archived datasets; audio, images, animations; primary sources; digital learning objects (e.g. applets); interactive (virtual, remote) laboratories; ... Tools: search; refer; validate; integrate; create; customize; publish; share; notify; collaborate; ... Expectations of Tracks Core Integration: to coordinate a distributed alliance of resource collection and service providers, and to ensure reliable and extensible access to and usability of the resulting network of learning environments and resources Collections: to aggregate and actively manage a subset of the digital library’s content within a coherent theme or specialty Services: to increase the impact, reach, efficiency, and value of the digital library in its fully operational form Targeted Research: to have immediate impact on one or more of the other three tracks Selected DL2 Ugrad Projects/Topics UNCW, Eduprise, TCNJ, IMS, CS, Math, Viz., … VT,…: iLumina Project Columbia University Earth sciences Stanford University Medicine (images) U. California Berkeley Engineering University of Maryland K-12 education U. Texas at Austin Physical anthropology Tracks & 29 Projects 6 Core Integration: Columbia, Cornell, E.Michigan/MERIT, UCAR, UCB, UMissouri/NCSA (Biology, Eng., Teacher Ed.) 13 Collections: Atmosphere, Biology, Biosciences, Earth Systems, Engineering, Health Sciences, Math 9 Services: Competitive Intelligence, Component Environment, Earth Systems J., Metadata NLP, Managing LOs, Peer Review, Video 1 Targeted Research: Paths NSDL Spine Portals & Portals Portals & & Clients Clients Clients NSDL NSDL Services Other NSDL Services Services full-service full-service collections NSDL collections Collections referenced referenced Referenced items&& items Items & collections collections Collections Core CollectionCore Building CollectionServices harvesting Core Building CollectionServices persistence Building Services protocol mediation Core CollectionUsage CIServices Services annotation CI Services query transform CI Services topic-map CIregistry Services personalization discussion (Slide from Dave Fulker, Bill Arms – 11/2/2000) CS Teaching Center (CSTC) Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. Learners benefit from having well-crafted modules that have been reviewed and tested. Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. ACM Education Board and SIG support, new NSF grant with COLLEGIS Research Institute and others … Browsing (1) Browsing (2) A Digital Library Case Study Domain: graduate education, research Genre: ETDs = electronic theses & dissertations Submission: http://etd.vt.edu Collection: http://www.theses.org Project: Networked Digital Library of Theses & Dissertations http://www.ndltd.org (NDLTD – remember: ND LTD / NDL TD) (also, newer NUDL: Networked University Digital Library, with e-courseware, etc.) Grad Program Library IT Ed. (Tech) The Networked Digital Library of Theses and Dissertations www.NDLTD.org Training Authors Expanding Access Preserving Knowledge Improving Graduate Education Enhancing Scholarly Communication Empowering Students & Universities Leader of the Worldwide ETD (Electronic Thesis and Dissertation) Initiative What are the long term goals? Attract all TDs/yr: 50K D-US, 25K D-Germany, 10K TD-Canada, … >200K/yr rich hypermedia ETDs that may turn into electronic portfolios (images, video, audio, …) Dramatic increase in knowledge sharing: literature reviews, bibliographies, … Services providing lifelong access for students: browse, search, prior searches, citation links Hundreds/thousands of downloads / year / work Student Defends & Finalizes ETD My Thesis ETD Student Gets Committee Signatures and Submits ETD Signed Grad School Graduate School Approves ETD, Student is Graduated Ph.D. Library Catalogs ETD, Access is Opened to the New Research WWW NDLTD User Search Support (multilingual, XML) NDLTD World Federated Search User Interface Virginia Tech ... (univ) Dissertations Online (Germany) OhioLink Portugese NL ... (lib / univ group) (national lib) Australia (regional) OAS, ISTEC (Latin America) Note: All groups shown are connected with NDLTD. Access Possibilities Web search engines www. theses. org Virginia MIT National Tech Library of Portugal www. library openarchives. catalog org clients CBUC (Spain) Ohio Link 3rd Party Services (e.g., UMI) National Projects: AU, GE, … Status of the Local Project Approved by university governance Spring 1996; required starting 1/1/97 Submission & access software in place Submission workshops for students (and faculty) occur often: beginner/adv. Faculty training as part of Faculty Development Initiative Over 3000 ETDs in collection – some have audio, video, large images, software, … US University Members (44) Air University (Alabama) Baylor University Brigham Young University (part, whole) Caltech Clemson University College of William & Mary Concordia University (Illinois) East Carolina University East Tenn. State U. – require fall 2000 Florida Institute of Technology Florida International University George Washington University Louisiana State University Marshall University (W. Va.) Miami University of Ohio Michigan Tech Mississippi State University MIT Naval Postgraduate School (CA) New Mexico Tech North Carolina State University Penn. State University Rochester Institute of Tech. U. of Colorado Health Science Center U. of Florida U. of Georgia University of Hawaii, Manoa U. of Iowa U. of Kentucky U. of Maine U. of North Texas – required since 8/99 U. of Oklahoma U. of South Florida U. of Tennessee, Knoxville U. of Tennessee, Memphis U. of Texas at Austin – required in 2001 U. of Virginia U. Wisconsin - Madison Vanderbilt U. Virginia Commonwealth U. Virginia Tech - required since 1/97 West Virginia U. - required fall 1998 Western Michigan U. Worcester Polytechnic Inst. OhioLINK Statewide Consortium Represents 79 colleges, universities, libraries Public Universities Private Universities and Colleges 2-Year Colleges Only a few (e.g., Miami U. of Ohio) are also NDLTD members on their own National / Regional Projects Australia – – – – – – – U. New South Wales (lead) U. of Melbourne U. of Queensland U. of Sydney Australian National U. Curtin U. of Technology Griffith U. – Universitat de Barcelona – Universitat Autonòma de Barcelona – Universitat Politècnica de Catalunya – Universitat Pompeu Fabra – Universitat de Girona – Universitat de Lleida – Universitat Rovira i Virgili – Universitat Oberta de Catalunya – Biblioteca de Catalunya Germany – Humboldt University (lead) – 3 other universities – 5 learned societies: Math, Physics, Chemistry, Sociology, Education – 1 computing center – 2 major libraries Consorci de Biblioteques Universitàries de Catalunya, as group, www.cbuc.es: South Africa: ECHEA/SEALS India, Portugal, … Other Countries with Members Belgium Netherland Brazil Norway Canada Russia Germany Singapore Hong S. Africa Kong India Italy Korea Mexico S. Korea Spain Taiwan UK ETD Initiative (and UMI) Students Learn about DL, EPub TDs become more expressive Global TDs become more accessible, archived Universities UMI N. Amer. (T)Ds are accessible, archived Convene Local Planning Group ETD Build Local ETD Site ETD Workshop/Training Digital Library Policies Inspection/Approval Responsibilities Handle local education and collection – Contact information for helpers – Archive Utilize standards – Metadata: MARC / DC-based concensus specification Share metadata – Union services, mirrored services Allow access – www.theses.org / www.dissertations.org – Open Archives Initiative (www.openarchives.org) MARIAN Layers User User User User Interface Layer User Information Layer Search Engine Layer Database Layer User Search Services Recommendation Services, etc Analysis Indexing Linking 5SL Source Description NDLTD/NUDL/Digital Library User MARIAN/DEByE Mediation Middleware Fusion Layer Wrapper Generator Additional Evidential Information Belief Network Layer Local Data Store Queries + Results wrapper wrapper Dublin Core SOIF Harvest protocol German PhysDis Collection ... Collection wrapper MARC Open Archives protocol VT OAI wrapper Z39.50 protocol ... RFC1807 Dienst protocol Greek Hellenic Dissertations Collection MIT ETD Collection Remember Grand Challenge Scaling / Technology Framework, Theory Simplification: DC, OAI Example Applications Conclusions Consider DLs: to use, to teach, to add to, to build Education is one important application of DLs Cultural heritage, linguistic diversity, new knowledge – all are important to preserve Technology opens up exciting opportunities in DLs to yield seamless “super” information systems Having a framework and theory may lead to better (more effective) systems and broader applicability Interoperability is part of the DL grand challenge URLs http://fox.cs.vt.edu http://ei.cs.vt.edu/~dlib (Courseware) http://www.dlib.org (D-Lib Magazine) www.smete.org and later www.nsf.gov/nsdl www.ndltd.org and www.theses.org www.cstc.org (CSTC and JERIC) www.openarchives.org www.jcdl.org (JCDL’2001 – June 24-28)