Digital Libraries: From Theory to Applications in Education and Business ICADL 2000 – Seoul, Korea December 7, 2000 Edward A. Fox fox@vt.edu http://fox.cs.vt.edu CS DLRL Internet TIC Virginia Tech, Blacksburg, VA, USA Outline Introduction Education (5S) (CSTC, NDLTD) OAI MARIAN Conclusions Acknowledgements (Selected) Conference Organizers and Sponsors Mentors: JCR Licklider, Michael Kessler, Gerard Salton Sponsors: Advance Auto Parts, CNI, DLF, IBM, NLM, NSF, OCLC, UNESCO, US Dept. of Ed. (FIPSE), … VT Faculty/Staff: Tony Atkins, Debra Dudley, John Eaton, Jim Hicks, Lance Matheson, Gail McMillan, James Powell, … VT Students: Fernando Das Neves, Robert France, Marcos Goncalves, Neill Kipp, Paul Mather, Ryan Richardson, Ohm Sornil, Hussein Suleman, Omar Vasnaik, Marc Vass, … Visitors: Mann-Ho Lee (Korea), Byongsun Kim (Korea), Shalini Urs (India), Akira Maeda (Japan) Internet Technology Innovation Center Supported by Virginia’s Center for Innovative Technology Statewide University Partners - Governing Board: Christopher Newport University – William Winter, William Muir, Virginia Electronic Commerce Technology Center / Southeastern Virginia Network (VECTEC/SEVAnet) George Mason University – Scott Martin, Internet Multimedia Center (ICM) – Steven Ruth, International Center for Applied Studies in IT (ICASIT) University of Virginia – Alf Weaver, Internet Commerce Group (InterCom) – Jim French, Internet Digital Library Virginia Tech – Edward Fox, Digital Library Research Laboratory (DLRL), CC, CS – Scott Midkiff, Center for Wireless Telecomm. (CWT), VTISC, ECpE JCDL 2001 First Joint ACM/IEEE Conference on Digital Libraries (+ NSF DLI-2 PI mtg) http://www.jcdl.org June 24-28, 2001 in Roanoke, VA Conference Committee: General Chair: Edward A. Fox, Virginia Tech Program Chair: Christine Borgman, UCLA Treasurer: Neil Rowe, Naval Postgraduate School Posters Chair: Craig Nevill-Manning, Rutgers U. URLs http://fox.cs.vt.edu http://www.dlib.vt.edu (DLRL) http://ei.cs.vt.edu/~dlib (Courseware) www.ndltd.org & www.theses.org www.cstc.org (CSTC and JERIC) www.openarchives.org (OAI) www.jcdl.org (JCDL’2001 – June 24-28) Collaboration! U.S. – Korea Joint Workshop on Digital Libraries San Diego Supercomputer Center August 10 & 11, 2000 Sponsored by National Science Foundation, USA Ministry of Information & Communication, Korea Institute of Information Tech. Assessment, Korea San Diego Supercomputer Center University of Maryland Virginia Tech Workshop Participants (1 of 3) Robert Allen University of Maryland rba@GLUE.UMD.EDU Dookwon Baik Korea University baik@SWSYS2.KOREA.AC.KR Ching-Chih Chen Simmons College, Boston chen@SIMMONS.EDU Su-Shing Chen University of Missouri - Columbia schen@ECN.MISSOURI.EDU Jonghoon Chun Myongji University jchun@WH.MYONGJI.AC.KR Gregory Crane Tufts University gcrane@PERSEUS.TUFTS.EDU Lois Delcambre Oregon Graduate Institute lmd@CSE.OGI.EDU Edward Fox Virginia Tech fox@VT.EDU Michael Gertz University of California, Davis gertz@CS.UCDAVIS.EDU Stephen Helmreich New Mexico State University shelmrei@CRL.NMSU.EDU Workshop Participants (2 of 3) Ulf Hermjakob USC Information Sciences Institute ulf@ISI.EDU Soon Joo Hyun Information & Communications University (ICU) shyun@ICU.AC.KR Hyeon Kim Korea Research & Development Information Center hyeon@KORDIC.RE.KR Sung-Hyuk Kim Sookmyung Women’s University ksh@SOOKMYUNG.AC.KR Yongchae Kim Ministry of Information & Communication yongari@MIC.GO.KR Ron Larsen University of Maryland rlarsen@DEANS.UMD.EDU Sang-goo Lee Seoul National University sglee@MARS.SNU.AC.KR Sang Ho Lee Soongsil University shlee@COMPUTING.SOONGSIL .AC.KR Young-Suk Lee MIT, Lincoln Laboratory ysl@SST.LL.MIT.EDU Karl Lo University of California, San Diego klo@UCSD.EDU Workshop Participants (3 of 3) Bruce Miller University of California, San Diego Rbmiller@UCSD.EDU Sung Been Moon Yonsei University sbmoon@YONSEI.AC.KR Reagan Moore San Diego Supercomputer Center moore@SDSC.EDU Sung Hyon Myaeng Chungnam National University shmyaeng@CS.CHUNGNAM.AC. KR Gang-Tak Oh National Computerization Agency, Seoul okt@NCA.OR.KR Sam-Gyun Oh SungKyunKwan University samgyun@YAHOO.COM samoh@YURIM.SKKU.AC.KR Hae-Chang Rim Korea University rim@NLP.KOREA.AC.KR Shalini Urs University of Mysore shaliniurs@HOTMAIL.COM Lee Zia National Science Foundation lzia@NSF.GOV Some Observations So many conferences! Lots of R&D! Exhibits: a DL industry is emerging. But: we don’t cite each other’s works; nobody is asking “Why”; we are not connecting theory + projects; nobody is talking about OAI. So, I’ve redone my talk, since you can see: – paper in proceedings – demo tomorrow (p. 327) and online – see tutorial notes (in book) and online DL = Users Direct (Organized Artifact Mediated Communication) Author Teacher Digital Reader Learner Sponsor Library Reviewer Editor Publisher Librarian DL = Users Direct (Organized Artifact Mediated Communication) Parts Supplier Inventory Sales Agent Training Shopper Repair Garages Store Manuals B2C Home Staff Digital Library Sales Partners B2B CS 6604: Digital Libraries (Fall 2000) http://scholar.lib.vt.edu/imagebase/ DL of Images of Birds for Virginia Tech Museum of Natural History Student Team Ameya Datey Aniket Sule Supriya Angle Balaprasuna Chennupati and the Eagle Scouts Under the guidance of Dr. Edward Fox Ms. Llyn Sharp (VT Museum of Natural History) Mr. Anthony Atkins (Digital Library and Archives) Plus, 3-D VTMNH minerals in UH3004 Libraries of the Future JCR Licklider, 1965, MIT Press: Unified Theory? Not ready in 1960s Analog – unified field theory in physics “Mess” today – segmented field, specialities – Database <-> Knowledge <-> Content Mgmnt – Multimedia, Hypermedia, Hypertext – Logic, Algebra, Artificial Intelligence, … Expensive, annoying for users – Don’t know where to look – Don’t know how to use services 5S Layers Societies Scenarios Spaces Structures Streams Definition: Digital Libraries are complex systems that help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams) Definition: 5S Framework Societies: interacting people (, computers) Scenarios: services, functions, operations, methods Spaces: domains + constraints (e.g., distance, adjacency): 2D, vector, probability Structures: relations, trees, nodes and arcs Streams: sequences of items (text, audio, video, network traffic) (5 Element System: Fire, Wood, Earth, Metal, Water) 5S: Combinations Societies + Scenarios = user model Societies + Scenarios + Spaces = user interface Streams + Structures = markup Streams + Structures + Scenarios = object Structures + Scenarios = DBMS Outline Introduction (5S) Education (CSTC, NDLTD) OAI MARIAN Conclusions NSDL Spine Portals & Portals Portals & & Clients Clients Clients NSDL NSDL Services Other NSDL Services Services full-service full-service collections NSDL collections Collections referenced referenced Referenced items&& items Items & collections collections Collections Core CollectionCore Building CollectionServices harvesting Core Building CollectionServices persistence Building Services protocol mediation Core CollectionUsage CIServices Services annotation CI Services query transform CI Services topic-map CIregistry Services personalization discussion (Slide from Dave Fulker, Bill Arms – 11/2/2000) ARIADNE Screens (E. Duval) CS Teaching Center (CSTC) Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. Learners benefit from having well-crafted modules that have been reviewed and tested. Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. ACM Education Board and SIG support, new NSF grant with UNCW, Eduprise, TCNJ, … - iLumina Project ACM J. of Educational Resources in Computing (JERIC) Browsing (1) Browsing (2) A Digital Library Case Study Domain: graduate education, research Genre: ETDs = electronic theses & dissertations Submission: http://etd.vt.edu Collection: http://www.theses.org Project: Networked Digital Library of Theses & Dissertations http://www.ndltd.org (NDLTD – remember: ND LTD / NDL TD) (also, newer NUDL: Networked University Digital Library, with e-courseware, etc.) ETD Initiative (and UMI) Students Learn about DL, EPub TDs become more expressive Global TDs become more accessible, archived Universities UMI N. Amer. (T)Ds are accessible, archived Library Catalogs ETD, Access is Opened to the New Research WWW NDLTD What are the long term goals? Attract all TDs/yr: 50K D-US, 25K D-Germany, 10K TD-Canada, … >200K/yr rich hypermedia ETDs that may turn into electronic portfolios (images, video, audio, …) Dramatic increase in knowledge sharing: literature reviews, bibliographies, … Services providing lifelong access for students: browse, search, prior searches, citation links Hundreds/thousands of downloads / year / work The Networked Digital Library of Theses and Dissertations www.NDLTD.org Training Authors Expanding Access Preserving Knowledge Improving Graduate Education Enhancing Scholarly Communication Empowering Students & Universities Leader of the Worldwide ETD (Electronic Thesis and Dissertation) Initiative Outline Introduction (5S) Education (CSTC, NDLTD) OAI MARIAN Conclusions Why do we need the Open Archives Initiative ? Current standards are too complicated Information wants to be free ! We can decouple – Running an archive (DL content collection) – Running a service (DL system / operation) So we can have more and better archives, that build on each other So we can have better services, that work on multiple collections OAI: Archives of Digital Objects Archive Access Protocol Handle (ID) terms and conditions Digital object The Open Archives Initiative www.openarchives.org a technical introduction Hussein Suleman (hussein@vt.edu) Virginia Tech DLRL December 2000 History Santa Fe Convention (October 1999) – Electronic pre-print community San Antonio (July 2000), Lisbon (Sept. 2000) – Broader interest from other parties Ithaca Meeting (September 2000) – Formulation of general-purpose protocol OAI Open Meetings (January –Feb. 2001) – Public release of specifications Federation vs. OAI Harvesting Federation – Sending out queries to remote sites and combining results Harvesting – Gathering all metadata from remote sites into a central search system – Lightweight protocol – Robust – Less network traffic – Redundant servers Black Box OAI-ETD Perspective … www.theses.org BN.PT (Portugal) SEALS (S.Africa) OhioLINK Dissert.Online (Germany) CBUC (Catalunya) CIC … VT CyberTheses (Francophone) NDC (Greece) MIT ISTEC (Ibero America) U. Bergen (Norway) ADT (Australia) PhysDis NSYSU (Taiwan) Splitting Data & Services Data Provider – Implements the OAI protocol on archive to allow external access to data Service Provider – Uses the OAI protocol to access external archives and provide services (such as searching or linking) on their metadata The Big Picture DL Repository 1 Repository 2 Repository 3 Repository 4 Requirements for OAI Protocol Unique identifiers (URNs) for each record Date-stamp for each record when last modified/created/deleted HTTP server with scripting ability OAI Harvesting Protocol v1 Operates over HTTP HTTP Requests and XML Responses HTTP Error codes 6 Service requests (verbs): – Identify, ListMetadataFormats, ListSets – ListIdentifiers, GetRecord, ListRecords Identify - Response ListMetadataFormats - Response GetRecord - Response Verb: ListRecords Retrieves metadata for multiple records Parameters – – – – – from – start date (O) until – end date (O) set – set to harvest from (O) resumptionToken – flow control mechanism (X) metadataPrefix – metadata format (R) ListRecords - Response Feature: Different Metadata Feature: Date Ranges Feature: Resumption Token Repository Explorer ODU Search Service What Next ? In General – Cross-archive searching – Cross-archive linking, de-duping, threading – Selective Filtering – Open-DL in a Box ? VT – The VT Digital Library – NDLTD Union Catalog the Open Archives Initiative Herbert Van de Sompel Cornell University -- Computer Science [acknowledgements] Carl Lagoze DLF FALL FORUM 2000 – Chicago – November 18th 2000 Actions • establish organizational stability for the OAI: • institutional backing from CNI & DLF • steering committee: policy guidance • technical committee: technical specifications • executive group: day to day coordination • workshops: public dissemination, feedback • revise specifications to allow adoption beyond preprints herbert van de sompel low-barrier interop umbrella metadata e-print FTXT OPAC A&I image herbert van de sompel low-barrier interop umbrella e-print metadata FTXT Author Title Abstract Identifer OPAC A&I image herbert van de sompel OAI harvesting tools service provider harvester data provider repository Datestamp Identifier Set Records herbert van de sompel r e p o s i t o r y revision of specifications • publication of specifications: • January 2001 • US Open Day, January 23rd Washington DC • EC Open Day, February 2001, Berlin • freeze specifications for 1 year: • stable for experimentation; not definitive • minimize risk for early adopters • maximize chances for future interoperability across communities herbert van de sompel alpha test of specs (11/2000-01/2001) • data providers: • arXiv -- Los Alamos • NACA -- NASA • CogPrints -- U Southampton • ETD -- Virginia Tech • Thesis & Dissertations from WorldCat -- OCLC herbert van de sompel alpha test of specs (11/2000-01/2001) • data providers: • HeinOnline law journals -- Cornell U • TEI-lite collection -- U Tennessee • STM publisher metadata -- U Illinois • Resource Disovery Network -- UKOLN • Open Language Archives -- U Pennsylvania • Open Video Project -- U North Carolina • Museum info. -- CIMI herbert van de sompel alpha test of specs (11/2000-01/2001) • software: • OAI harvesting interface to Ex Libris Aleph 500 Integrated Library System -- Ex Libris • OAI harverster – Cornell U •OAI harverster – Virginia Tech • Open-source software capable of creating a merged catalog of metadata harvested from OAIservers -- OCLC herbert van de sompel alpha test of specs (11/2000-01/2001) • service providers: • Repository explorer -- Virginia Tech • MARIAN DL -- Virginia Tech • ARC service -- Old Dominion U herbert van de sompel New OAI mission statement The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. The Open Archives Initiative has its roots in an effort to enhance access to e-print archives as a means of increasing the availability of scholarly communication. Continued support of this work remains a cornerstone of the Open Archives program. herbert van de sompel New OAI mission statement The fundamental technological framework and standards that are developing to support this work are, however, independent of the both the type of content offered and the economic mechanisms surrounding that content, and promise to have much broader relevance in opening up access to a range of digital materials. [...] herbert van de sompel Harvesting Document Metadata for Federated Search CS6604 Fall 2000 Project Presented By Avnish Kumar Chhabra Benefits of Harvesting Limited storage requirement Fast search Consistently ranked results Improved reliability Distributed collections are transparent to user. Efficient use of network resources. Design of the Solution OAI wrapper Digital Library collection Parser/Updater Update Scheduling Query Generation Z39.50 Wrapper Queries Replies MARIAN Metadata Database New Metadata Boundary of System developed Implementation Server, Protocol, Update Frequency Main scheduler thread: OAI harvester class: OAIInterface Schedule File SiteInfo HarvestorMonitor: Monitor for arbitrating access to network resources OAIHandler XML Document Event Handler class Instantiated with URL of OAI site And scheduling frequency Abs Sub DL Collection Auth Features of the system developed Per-collection execution thread Schedules updates Encapsulation of protocol specific details Extensibility Control over active execution threads Fault tolerance – Server unreachable – Failure / timeout of individual connections Time zones and date ambiguity considered Outline Introduction (5S) Education (CSTC, NDLTD) OAI MARIAN Conclusions MARIAN Layers User User User User Interface Layer User Information Layer Search Engine Layer Database Layer User Search Services Recommendation Services, etc Analysis Indexing Linking 5SL Source Description NDLTD/NUDL/Digital Library User MARIAN Mediation Middleware Local Data Store Wrapper Generator Queries + Results wrapper wrapper Dublin Core SOIF Harvest protocol German PhysDis Collection ... Collection wrapper MARC Open Archives protocol VT OAI wrapper Z39.50 protocol ... RFC1807 Dienst protocol Greek Hellenic Dissertations Collection MIT ETD Collection Part of Hierarchy of MARIAN Classes Dig ital Information Object Structured Document English Text Controlled String Text Non-English European Language Text Korean Text Person’s Name Relevant Document Structure MARIAN-Phronesis Interoperability CS6604 Fall 2000 Project Tracy Lewis Ryan Richardson Kim Woods MARIAN-Phronesis V1 Architectural Diagram MARIAN Search Page PHRONESIS Marian Query CGI Script Phron Query Display to user Create object instance CGI Script Phron Results MARIAN-Phronesis Login Page Query in Español Outline Introduction (5S) Education (CSTC, NDLTD) OAI MARIAN Conclusions Conclusions Education is an important application of DLs Having a framework and theory may lead to better (more effective) systems and broader applicability – 5S – MARIAN Interoperability is part of the DL grand challenge – OAI