University Electronic Publishing through Digital Libraries: Courseware, Theses and Dissertations Singapore - Dec. 2002 Edward A. Fox fox@vt.edu http://fox.cs.vt.edu CS DLRL Internet TIC NDLTD CITIDEL NSDL … Virginia Tech, Blacksburg, VA, USA Acknowledgements (Selected) • Sponsors: ACM, Adobe, IBM, Microsoft, NSF (Grants CDA-9312611; DUE-0121741, 0136690, 0121679; IIS0080748, 0086227, 0002935, and 9986089), OCLC, SOLINET, UNESCO, US Dept. Ed. (FIPSE), VTLS, … • Faculty/Staff (now): Boots Cassel, Debra Dudley, Lee Giles, Rex Hartson, John Impagliazzo, Deborah Knox, JAN Lee, Kurt Maly, Gail McMillan, Manuel Perez, Muhammad Zubair, … • Students: Fernando Das Neves, Marcos Goncalves, Paul Mather, Ryan Richardson, Priya Shivakumar, Hussein Suleman, Wensi Xi, … • UNESCO Analytical Survey: Leonid Kalinichenko Outline • Case Study: NDLTD • • • • Case Study: CSTC Case Study: CITIDEL Interoperability: OAI, ODL Conclusions A Digital Library Case Study • Domain: graduate Project: education, research Networked Digital • Genre:ETDs=electronic Library of Theses & theses & dissertations Dissertations • Submission: (NDLTD) http://etd.vt.edu http://www.ndltd.org • Collection: http://www.theses.org The Networked Digital Library of Theses and Dissertations www.NDLTD.org Training Authors Expanding Access Preserving Knowledge Improving Graduate Education Enhancing Scholarly Communication Empowering Students & Universities Leader of the Worldwide ETD (Electronic Thesis and Dissertation) Initiative NDLTD Grad Program IT Library Ed. (Tech) Key Ideas: Scalability Networked infrastructure University collaboration Workflow, automation Education is the rationale Maximal Access 8th graders vs. grads Authors must submit Standards PDF, SGML, MM, MARC, DC, URNs, Federated search What led to today’s meeting? • 1987 mtg in Ann Arbor: UMI, VT, … • 1992 mtg in Washington: CNI, CGS, UMI, VT and 10 universities with 3 reps each • 1993 mtg in Atlanta to start Monticello Electronic Library (regional, US Southeast): SURA, SOLINET • 1994 mtg at VT: std: PDF + SGML + multimedia objects • 1996 funding by SURA, US Dept. of Education (FIPSE) • 1997 meetings in UK, Germany, ... • 1998 – 1st symposium – Memphis (20) • 1999 – 2nd symposium – Blacksburg (70) • 2000 – 3rd symposium – St. Petersburg (225) • 2001 – 4th symposium – Caltech (200) • 2002 – 5th symposium–BYU; 2003–Berlin; 2004–Kentucky What are the long term goals? • 400K US students / year getting grad degrees are exposed / involved • 200K/yr rich hypermedia ETDs that may turn into electronic portfolios (images, video, audio, …) • Dramatic increase in knowledge sharing: literature reviews, bibliographies, … • Services providing lifelong access for students: browse, search, prior searches, citation links • Hundreds/thousands of downloads / year / work Convene Local Planning Group ETD Build Local ETD Site ETD Workshop/Training Digital Library Policies Inspection/Approval Student Prepares Thesis/Dissertation NDLTD Literature Computer Resources Research Student Defends & Finalizes ETD My Thesis ETD Student Gets Committee Signatures and Submits ETD Signed Grad School Graduate School Approves ETD, Student is Graduated Ph.D. Library Catalogs ETD, Access is Opened to the New Research WWW NDLTD National / Regional Projects • Australia • • • • • • • U. New South Wales (lead) U. of Melbourne U. of Queensland U. of Sydney Australian National U. Curtin U. of Technology Griffith U. • Germany • Humboldt University (lead) • 3 other universities • 5 learned societies: Math, Physics, Chemistry, Sociology, Education • 1 computing center • 2 major libraries • OhioLINK: 79 colleges/univs • Consorci de Biblioteques Universitàries de Catalunya, as group, www.cbuc.es: 9 sites • India • Korea • Brazil • UK (British Library, JISC, Edinburgh) • UNESCO (especially Latin America, Eastern Europe, Africa) Some Countries • • • • • • • • • • • • • Australia Belgium Brazil Canada China, Hong Kong Columbia Finland France Germany India (Hyderabad) Italy Korea Mexico • • • • • • • • • • • • Netherland Norway Russia Singapore S. Africa (Rhodes U.) S. Korea Spain Sudan Sweden Taiwan UK USA Institutional Members • • • • • • • • • • • • • • • • • British Library Cinemedia Coalition for Networked Information (CNI) Committee on Institutional Cooperation (CIC) Consorci de Biblioteques Universitàries de Catalunya Diplomica.com Dissertation.com Dissertationen Online (Germany) ETDweb, a Division of Answer4.com Ibero-American Science & Technology Education Consortium (ISTEC) National Documentation Centre (NDC), Greece National Library of Portugal (for all universities) OCLC Online Computer Library Center OhioLINK Organization of American States (SEDI/OAS) Southeastern Library Network (SOLINET) UNESCO (www.unesco.org/webworld/etd) Access Possibilities Web search engines www. theses. org Virginia MIT National Tech Library of Portugal www. library openarchives. catalog org clients CBUC (Spain) Ohio Link 3rd Party Services (e.g., UMI) National Projects: AU, GE, … ETD-MS • ETD Metadata Standard • XML-encoded metadata standard (content and encoding) for Electronic Theses and Dissertations (ETDs) • in part conforming to Dublin Core (DC) • using UNICODE • (optionally / later using RDF) • Well specified relationship with MARC NDLTD Members and ETD-MS • NDLTD members will • Share metadata for their ETDs • Providing that in either ETD-MS • Or if they use a version of MARC locally, work to have that eventually shared in either MARC21 or UNIMARC • Run OAI, either locally or in consortia, so their metadata can be harvested, according to necessary terms and conditions Some recent additions • ETD individuals support • http://etdindividuals.dlib.vt.edu:9090 • ETD discussion (e-prints) • http://ndltdpapers.dlib.vt.edu:9090 • Conference papers and presentations • http://www.ndltd.org/WVUproc.htm • Marcel Dekker book in publication What are plans at VT? • LOCKSS welcomed us • Lots of Copies Keeps Stuff Safe • • • • MARIAN: harvest, crawl/scrape, fed search Metadata crosswalks and format converters XML schema for ETDs Open Digital Libraries: easy to add services! • http://oai.dlib.vt.edu/odl Union catalog (OCLC) • OCLC will expand the OAI data provider on TDs • Will get data from WorldCat • Will harvest from all who contact them • Need DC and either ETD-MS or MARC • Will have a set for ETDs Union catalog (VTLS, VT) • VTLS will enhance search/browse service for ETDs • Will harvest from OCLC’s set of ETD records • Will receive through other mechanisms, too • Will work with MARC-21 and ETD-MS • VT will continue to offer experimental services NUDL (www.nudl.org) Int’l Research Support • Networked University Digital Library • Partners: Germany, Mexico (Puebla and Monterrey), Brazil • Problems: Multilingual search, high performance DLs, requirements/usability, … • Start with ETDs, then expand to other student works, portfolios, data sets, (CS) courseware, ... Outline • Case Study: NDLTD • • • • Case Study: CSTC Case Study: CITIDEL Interoperability: OAI, ODL Conclusions CS Teaching Center (CSTC) • Instead of building large, expensive multimedia packages, that become obsolete and are difficult to re-use, concentrate on small knowledge units. • Learners benefit from having well-crafted modules that have been reviewed and tested. • Use digital libraries to build a powerful base of support for learners, upon which a variety of courses, self-study tutorials & reference resources can be built. Browsing (2) JERIC • Journal of Educational Resources in Computing • Accessible from www.cstc.org and www.acm.org • ACM and SIGCSE support • Refereed and interactive • Part of ACM Digital Library Outline • Case Study: NDLTD • • • • Case Study: CSTC Case Study: CITIDEL Interoperability: OAI, ODL Conclusions www.CITIDEL.org • Computing and Information Technology Interactive Digital Education Library, an NSDL Collection Track project • Led by Virginia Tech, with co-PIs: • Fox (director, DL systems) • Lee (history) • Perez (user interface, Spanish support) • Partners • College of New Jersey (Knox) • Hofstra (Impagliazzo) • Villanova (Cassel) • Penn State (Giles) Summary of Spring 2001 Survey of CITIDEL-related Collections and their Sizes Size of Collection 1-5 items 6-100 items 101-999 items +1000 items Number of Collections Identified 100-300 50 20-35 10-25 Multi-dimensional Categorization Quality Peer reviewed Editor reviewed Nominated Identified by crawl Algorithms Java English Multimedia Spanish Language Topic CITIDEL Collection Sources include ACM include CSTC Research Index IEEE-CS … NCSTRL include metadata include ACM DL fulltext include SIGCSE proceedings NEC’s data JERIC Experts’ finding aids include data processed w. R.I. Borner’s info viz software repository CITIDEL Collection Building thru Nominating Submitting include after Creating include after Composing using VIADUCT after Searching, Browsing thru GetSmart or thru Crawling aided by Classifying using Crawlifier Overview of CITIDEL architecture USER PORTALS DIGITAL LIBRARY SERVICES REPOSITORIES Distributed repository structure Digital Library Services OAI Data Provider Applets Repository OAI Data Harvester Union Metadata Repository Laboratories Repository Syllabi Repository Papers Repository ... Digital library architecture for local and interoperable CITIDEL services EDUCATORS Multilingual Searching LEARNERS Browsing Union Metadata Filtering Filtering Profiles OAI Data Provider Annotating ADMINISTRATORS Revising Administering User Profiles Annotations OAI Data Harvester Remote and Peer Digital Libraries (eg. NSDL -CIS) PORTALS SERVICES REPOSITORIES Outline • Case Study: NDLTD • • • • Case Study: CSTC Case Study: CITIDEL Interoperability: OAI, ODL Conclusions Open Archives Initiative OAI www.openarchives.org openarchives@openarchives.org The World According to OAI Service Providers Discovery Current Awareness Data Providers Preservation Technical Umbrella for Practical Interoperability… Reference Libraries Museums Publishers E-Print Archives …that can be exploited by different communities Tiered Model of Interoperability Mediator services Metadata harvesting Document models OAI – Black Box Perspective Services: Search Browse Metadata: Summarize Visualize OA 7 OA 4 OA 2 OA 3 OA 1 OA 6 OA 5 Docs: DO DO DO DO DO DO DO Aggregation through OAI Harvesting CITIDEL NCSTRL Lite Sites Archive Eprints Active Own: History, ResearchIndex, CSTC, … IEEE-CS, ACM, … Approaches to Open Archives Build By Institution Build By Discipline Author Category Interdisciplinary Year Language Query … OAI Perspective • Rethink your efforts in terms of providers of • Data, Services • Reduced work for data providers • Tools available • Don’t need to offer services • Reduced work for service providers • Others provide the data • Can use tools and systems for OAI, XOAI • Results • More data becoming available • To more people • Supported by improved services repository support data harvesting data h a r v e s t e r OAI protocol r e p o s i t o r y items selective harvesting - datestamps harvest within date range record record r e p o s i t o r y selective harvesting - sets harvest within set record record record r e p o s i t o r y S1 S2 What is an Open Archive ? • Any WWW-based system that can be accessed through the well-defined interface of the Open Archives Protocol for Metadata Harvesting • … aka OAI-Compliant Repository • No implications for: • • • • Physical storage of data Cost of data Metadata and data formats Access control to server Sample OAI Record <record> <header> <identifier>oai:sigir:ws3</identifier> <datestamp>2001-08-13</datestamp> </header> <metadata> <dc> <title>OAI Workshop at SIGIR</title> <creator>Hussein Suleman</creator> <language>English</language> </dc> </metadata> <about> <metadataID>oai:sigir:ws3md</metadataID> </about> </record> Sets • Protocol mechanism to allow for harvesting of sub-collections • No well-defined semantics – depends completely on local data providers • May be defined by arrangement between data providers and service providers • E.g., Subject areas, years, author names, search queries Protocol for Metadata Harvesting • Service Requests • Identify • ListMetadataFormats • ListSets • GetRecord • ListIdentifiers • ListRecords • Metadata Multiplicity • Date Ranges • Resumption Tokens Example: Union Collection of ETDs (Electronic Theses and Dissertations, for Networked Digital Library of Theses and Dissertations, NDLTD) VIRTUA MARIAN Future: recommender, … Merged Metadata Collection LEGEND OAI Data Provider Virginia Tech ETD Archive Humboldt ETD Archive Duisburg ETD Archive … OAI Service Provider OAI Harvesting Example: Details Name Authority Service (e.g. OCLC) NDLTD Central VTLS Union Catalog NDLTD Site / Member Librarian Verification / Validation / Enrichment / Maintenance Student Entry OAI Server Local DB MARIAN Union Catalog Virtua MARC DB OAI Harvester Conversion Local Search / Brow se Alternate MARC Transport (ftp?) tapes?) Open Digital Library (ODL) Hypothesis (Hussein Suleman) • Can we leverage the successful model of the OAI Protocol for Metadata Harvesting to alleviate our architectural problems ? Maybe … if Digital Libraries can be modeled as • networks of extended Open Archives, where • each extended Open Archive is a • source of data and/or a provider of services. Example Architecture (NDLTD) Virginia Tech User Interface PhysNet Humboldt Search Browse Recent Duisburg CalTech Union Catalog MIT Filter MIT legend Dresden User Interface OAI/ODL archive OAI/ODL protocol ODL Demonstration - FrontPage ODL Demonstration - Search ODL Demonstration - Browse Outline • Case Study: NDLTD • • • • Case Study: CSTC Case Study: CITIDEL Interoperability: OAI, ODL Conclusions Conclusions • Digital libraries can help advance education. • Singapore is invited to engage in NSDL, CITIDEL, NDLTD, and other ventures. • UNESCO Analytical Survey on Digital Libraries in Education is recommending DLE in each nation. • Local and national support can • • • • stimulate activities, including collaboration promote a sharing culture, especially in research and teaching leverage others’ investments (networking, computing, …) encourage / facilitate learning, innovation and problem solving Selected Links • CITIDEL • www.citidel.org • NCSTRL • www.ncstrl.org • NDLTD • www.ndltd.org • NSDL • www.nsdl.org • Virginia Tech Digital Library Courseware • http://ei.cs.vt.edu/~dlib • Virginia Tech Digital Library Research Laboratory (DLRL) • http://www.dlib.vt.edu • (5S, 5SL, AmericanSouth.Org, CSTC, ENVISION, MARIAN, NSDL, OAI, ODL) • Repository Explorer • http://purl.org/net/oai_explorer NDLTD, More Links • ARC Cross-Archive Search Service • http://arc.cs.odu.edu/ • Dublin Core Metadata Initiative • www.dublincore.org • E-Prints DL-in-a-box • www.eprints.org • Open Archives Initiative • http://www.openarchives.org • http://www.openarchives.org/OAI/openarchivesprotocol.htm • http://www.dlib.vt.edu/projects/OAI/ • XML Schema Validator • http://www.w3.org/2001/03/webdata/xsv • XML Tools at W3C • http://www.w3.org/XML/#software