Collaboration on Digital Libraries NEC Dec. 27, 2000 Edward A. Fox fox@vt.edu http://fox.cs.vt.edu CS DLRL Internet TIC Virginia Tech, Blacksburg, VA, USA Acknowledgements (Selected) Mentors: JCR Licklider, Michael Kessler, Gerard Salton Sponsors: Adobe, IBM, Microsoft, NLM, NSF, OCLC, SOLINET, SURA, UNESCO, US Dept. of Ed. (FIPSE), … VT Faculty/Staff: Tony Atkins, Debra Dudley, John Eaton, Gwen Ewing, Peter Haggerty, JAN Lee, Gail McMillan, Manuel Perez, Len Peters, James Powell, … VT Students: Emilio Arce, Fernando Das Neves, Brian DeVane, Robert France, Marcos Goncalves, Scott Guyer, Robert Hall, Brian Hobbs, Neill Kipp, Paul Mather, Tim McGonigle, Todd Miller, Constantinos Phanouriou, William Schweiker, Ohm Sornil, Hussein Suleman, Patrick Van Metre, Laura Weiss, … URLs http://fox.cs.vt.edu http://ei.cs.vt.edu/~dlib (Courseware) http://www.dlib.org (D-Lib Magazine) www.smete.org and later www.nsf.gov/nsdl www.ndltd.org and www.theses.org www.cstc.org (CSTC and JERIC) www.openarchives.org www.jcdl.org (JCDL’2001 – June 24-28) Digital Library Courseware http://ei.cs.vt.edu/~dlib/ WWW pages or large PDF copy files CourseInfo quizzes based on books by Michael Lesk (MKP.com) and William Arms (MIT Press) Contents based on books, with other popular topics added (e.g., agents) Separate pages to supplement: Definitions, Resources (People, Projects), and References JCDL 2001 First Joint ACM/IEEE Conference on Digital Libraries (+ NSF DLI-2 PI mtg) http://www.jcdl.org June 24-28, 2001 in Roanoke, VA Conference Committee: General Chair: Edward A. Fox, Virginia Tech Program Chair: Christine Borgman, UCLA Treasurer: Neil Rowe, Naval Postgraduate School Posters Chair: Craig Nevill-Manning, Rutgers U. Communications (bandwidth, connectivity) Locating Digital Libraries in Computing and Communications Technology Space Digital Libraries technology trajectory: intellectual access to globally distributed information Computing (flops) Digital content less more (Slide from S. Griffin, NSF) Service Machine 1 Service PetaPlex Complex Service Machine 2 Nanoserver FRONT END MACHINE RS/6000, 1G RAM, 4 Proc. Machine 3 Service Machine 4 Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver Nanoserver PetaPlex Digital Library Machine (“super” object store): Parallel computer / storage utility Research: inverted files, video server, … (supported by IBM, AOL, NSF, …) Knowledge Systems Incorporated is supplying VT-PetaPlex-1 with 2.5 terabytes through 100 nodes: Net connection + 25GB disk + 233 MHz Pentium + Linux MARIAN Multiple Access Retrieval of Information with Annotations (Marian the Librarian …) Evolved from CODER system to a distributed Online Public Access Catalog (OPAC), then DL backend, now becoming a full DL system From C/C++ to Java Future: NDLTD, NUDL, PetaPlex Use for campus collection management Use for www.theses.org as centralized system with gateway services: OAI, Harvest, Z39.50, … MARIAN Layers User User User User Interface Layer User Information Layer Search Engine Layer Database Layer User MARIAN Parallelism response time (ms) Java part response time vs. query rate comparation (type 1 requests) 4000 3000 2000 1000 0 0 100 200 300 queryrate (#/min) all modules in one machine one "webgate" two "webgate"s four "webgate"s 400 500 Search Services Recommendation Services, etc Analysis Indexing Linking 5SL Source Description NDLTD/NUDL/Digital Library User MARIAN/DEByE Mediation Middleware Fusion Layer Wrapper Generator Additional Evidential Information Belief Network Layer Local Data Store Queries + Results wrapper wrapper Dublin Core SOIF Harvest protocol German PhysDis Collection ... Collection wrapper MARC Open Archives protocol VT OAI wrapper Z39.50 protocol ... RFC1807 Dienst protocol Greek Hellenic Dissertations Collection MIT ETD Collection ENVISION NSF “A User-Centered Database from the Computer Science Literature” (1991-93) Collected bib/typesetter data, converted to SGML Scanned thousands of page images MARIAN search engine - can be made available (also applied to the Virginia Tech library catalog) used as part of a prototype object-based DL, with tailored visualization interface (L. Nowell dissertation) DL-Related Timeline WWW 1985 1990 Scholarly EPub in U’s SGML 1995 xxx CSTR PDF 2000 NCSTRL OAI CoRR XML MPEG-7 JPEG, MPEG PCs Proposed DLI DLI2 NSDL Ugrad DL TEI (CSTC, iLumina,…) (Envision, EI) HyperCard Java DC RDF Hypertext Conf. ETDs NDLTD Information Life Cycle Borgman et al.: Workshop Report on Social Aspects of Digital Libraries: http://www-lis.gseis. ucla.edu/DL/ Core of DL Collecting – Authoring, Repositories, Archives, Museums, … Organizing – Packaging of Data and Metadata, Storing – Naming/Identifying and Cataloging – Classification, Clustering, … Serving – Indexing, Linking, Summarizing, Visualizing – Browsing, Accessing, Searching, Filtering, Retrieving, Distributing, Using, … Digital Libraries Shorten the Chain from Editor Reviewer Publisher A&I Consolidator Library DL = Users Direct (Organized Artifact Mediated Communication) Author Teacher Digital Reader Learner Reviewer Editor Dr. Library Patient Librarian Author tools www.physik.uni-oldenburg.de/EPS/mmm A Digital Library Case Study Domain: graduate education, research Genre: ETDs = electronic theses & dissertations Submission: http://etd.vt.edu Collection: http://www.theses.org Project: Networked Digital Library of Theses & Dissertations http://www.ndltd.org (NDLTD – remember: ND LTD / NDL TD) (also, newer NUDL: Networked University Digital Library, with e-courseware, etc.) Status of the Local Project Approved by university governance Spring 1996; required starting 1/1/97 Submission & access software in place Submission workshops for students (and faculty) occur often: beginner/adv. Faculty training as part of Faculty Development Initiative Over 3000 ETDs in collection – Some have audio, video, large images, software, … – Millions of accesses/yr – 100s to 1000s per work What are the long term goals? Attract all TDs/yr: 50K D-US, 25K D-Germany, 10K TD-Canada, … >200K/yr rich hypermedia ETDs that may turn into electronic portfolios (images, video, audio, …) Dramatic increase in knowledge sharing: literature reviews, bibliographies, … Services providing lifelong access for students: browse, search, prior searches, citation links Hundreds/thousands of downloads / year / work Student Gets Committee Signatures and Submits ETD Signed Grad School Library Catalogs ETD, Access is Opened to the New Research WWW NDLTD US University Members (44) Air University (Alabama) Baylor University Brigham Young University (part, whole) Caltech Clemson University College of William & Mary Concordia University (Illinois) East Carolina University East Tenn. State U. – require fall 2000 Florida Institute of Technology Florida International University George Washington University Louisiana State University Marshall University (W. Va.) Miami University of Ohio Michigan Tech Mississippi State University MIT Naval Postgraduate School (CA) New Mexico Tech North Carolina State University Penn. State University Rochester Institute of Tech. U. of Colorado Health Science Center U. of Florida U. of Georgia University of Hawaii, Manoa U. of Iowa U. of Kentucky U. of Maine U. of North Texas – required since 8/99 U. of Oklahoma U. of South Florida U. of Tennessee, Knoxville U. of Tennessee, Memphis U. of Texas at Austin – required in 2001 U. of Virginia U. Wisconsin - Madison Vanderbilt U. Virginia Commonwealth U. Virginia Tech - required since 1/97 West Virginia U. - required fall 1998 Western Michigan U. Worcester Polytechnic Inst. National / Regional Projects Australia – – – – – – – U. New South Wales (lead) U. of Melbourne U. of Queensland U. of Sydney Australian National U. Curtin U. of Technology Griffith U. – – – – – – – – – Germany – Humboldt University (lead) – 3 other universities – 5 learned societies: Math, Physics, Chemistry, Sociology, Education – 1 computing center – 2 major libraries Consorci de Biblioteques Universitàries de Catalunya, as group, www.cbuc.es: Universitat de Barcelona Universitat Autonòma de Barcelona Universitat Politècnica de Catalunya Universitat Pompeu Fabra Universitat de Girona Universitat de Lleida Universitat Rovira i Virgili Universitat Oberta de Catalunya Biblioteca de Catalunya OhioLink South Africa: ECHEA/SEALS India, Portugal, … Other Countries with Members Belgium Netherland Brazil Norway Canada Russia Germany Singapore Hong S. Africa Kong India Italy Korea Mexico S. Korea Spain Taiwan UK Build Local ETD Site ETD Workshop/Training Digital Library Policies Inspection/Approval CS Teaching Center (CSTC) Collection of reviewed online resources used to aid in teaching of Computer Science Supports author submission and peer-review process for new ACM Journal of Educational Resources In Computing (JERIC) Connected with NSDL (NSF 00-44) http://www.cstc.org CS Teaching Center (CSTC) Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. Learners benefit from having well-crafted modules that have been reviewed and tested. Use digital libraries as a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. [See NSF NSDL - National Science (math, engineering, technology education) Digital Library (formerly SMETE-lib) at www.dlib.org/smete/public/smete-public.html; www.smete.org] iLumina: NSF NSDL grant with COLLEGIS Research Institute/Eduprise, UNCW, TCNJ, … Browsing (1) Browsing (2) (From Lee Zia, NSF) Programmatic History NSDL Program NSF: FY 00-02 DL Operational Fall, 2002 DLs & UG Earth Systems Education initiated FY 99, continuing DLI 2 Special Emphasis in Undergrad Education FY 98-99 DLI 2 - NSF, et al., initiated in FY 98, continuing Digital Libraries Initiative (DLI 1) - NSF/NASA/ARPA, FY 94-97 Expectations of Tracks Core Integration: to coordinate a distributed alliance of resource collection and service providers, and to ensure reliable and extensible access to and usability of the resulting network of learning environments and resources Collections: to aggregate and actively manage a subset of the digital library’s content within a coherent theme or specialty Services: to increase the impact, reach, efficiency, and value of the digital library in its fully operational form Targeted Research: to have immediate impact on one or more of the other three tracks Tracks & 29 Projects 6 Core Integration: Columbia, Cornell, E.Michigan/MERIT, UCAR, UCB, UMissouri/NCSA (Biology, Eng., Teacher Ed.) 13 Collections: Atmosphere, Biology, Biosciences, Earth Systems, Engineering, Health Sciences, Math 9 Services: Competitive Intelligence, Component Environment, Earth Systems J., Metadata NLP, Managing LOs, Peer Review, Video 1 Targeted Research: Paths NSDL Spine Portals & Portals Portals & & Clients Clients Clients NSDL NSDL Services Other NSDL Services Services full-service full-service collections NSDL collections Collections referenced referenced Referenced items&& items Items & collections collections Collections Core CollectionCore Building CollectionServices harvesting Core Building CollectionServices persistence Building Services protocol mediation Core CollectionUsage CIServices Services annotation CI Services query transform CI Services topic-map CIregistry Services personalization discussion (Slide from Dave Fulker, Bill Arms – 11/2/2000) Our Collaboration for NSDL PARTNERS Hofstra Villanova Penn State (with NEC) Virginia ACM, Tech IEEE-CS, Morgan Kaufmann, … Our Collaboration for NSDL FUNDING $1M for 2 years, starting 9/1/2001 - NSF $225K: Hofstra (1 GRA, 1 PI) $175K: Villanova(1 GRA, 1 PI) $175K: Penn State(1 GRA, 1 PI) $425K: VT (4 GRAs, 3 PIs: Fox, Lee, Perez) ACM, IEEE-CS, Morgan Kaufmann, … Our Collaboration for NSDL STRENGTHS PetaPlex, MARIAN, NDLTD, CSTC, JERIC SIGCSE: SIG, Conference, Bulletin History as integrating theme; adding demos Special support for Hispanic community Niche portals, search engines, links across collections, citation data --- for levels: undergrad, high school, middle school, etc. Our Collaboration for NSDL PROPOSAL PLAN Student project completed Fall 2000 Kate will continue in Spring 2001 through an independent study Meetings: John visited VT, Boots and I visit Hofstra today, I visit NEC on 12/27, John visits VT again, … Get support letters, refine proposal, … Our Collaboration for NSDL DOCUMENTS “Computing Packet See Digital Library (CoDL)” prepared by student group their slides next Contents: – Project report – CoDL proposal outline – Proposals from some successful NSDL groups