Digital Libraries : Archaeology, Automation, ETDs, and Enhancements Edward A. Fox (fox@vt.edu) Virginia Tech, USA IADLC 2005 The International Advanced Digital Library Conference in Nagoya August 25-26, 2005 1 Outline • • • • • • • Acknowledgements Introduction: Life Cycle, Curric., 5S, Book ETANA-DL, 5S Description Theory and Automation Education: CS, ETDs Quality, Integration, and Automation Selected Links, Discussion 2 Acknowledgements: Students • Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Gonçalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, Qinwei Zhu, … 3 Acknowledgements: Faculty, Staff • Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, … 4 Other Collaborators (Selected) • • • • • • • • • Brazil: FUA, UFMG, UNICAMP Case Western Reserve University Emory, Notre Dame, Oregon State Germany: Univ. Oldenburg Mexico: UDLA (Puebla), Monterrey College of NJ, Hofstra, Penn State, Villanova University of Arizona University of Florida, Univ. of Illinois University of Virginia 5 Acknowledgements - Mentors • JCR Licklider – undergrad advisor (1969-71) – Author in 1965 of “Libraries of the Future” – Before, at ARPA, funded start of Internet • Michael Kessler – BS thesis advisor – Project TIP (technical information project) – Defined bibliographic coupling • Gerard Salton – graduate advisor (1978-83) – “Father of Information Retrieval” 6 Acknowledgements: Support • ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579; ITR0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS Outline • • • • • • • Acknowledgements Introduction: Life Cycle, Curric., 5S, Book ETANA-DL, 5S Description Theory and Automation Education: CS, ETDs Quality, Integration, and Automation Selected Links, Discussion 8 Information Life Cycle Authoring Modifying Using Creating Retention / Mining Organizing Indexing Accessing Filtering Storing Retrieving Distributing Networking 9 RELATED TOPICS CORE DL TOPICS COURSE STRUCTURE DL Curriculum Framework Semester 1: DL collections: development/creation Digitization Storage Interchange Metadata Cataloging Author submission Digital objects Composites Packages Semester 2: DL services and sustainability Architectures (agents, buses, wrappers/mediators) Interoperability Spaces (conceptual, geographic, 2/3D, VR) Documents E-publishing Markup Multimedia streams/structures Capture/representation Compression/coding Bibliographic information Bibliometrics Citations Content-based analysis Multimedia indexing Naming Repositories Archives Services (searching, linking, browsing, etc.) Archiving and preservation Integrity Architectures (agents, buses, wrappers/mediators) Interoperability Thesauri Ontologies Classification Categorization Multimedia presentation, rendering Info. Needs Relevance Evaluation Effectiveness Intellectual property rights mgmt. Privacy Protection (watermarking) Routing Filtering Community filtering Search & search strategy Info seeking behavior User modeling Feedback Info summarization Visualization 10 5S Layers Societies Scenarios Spaces Structures Streams 11 5S Layers 5 Elements Societies Fire Scenarios Wood Spaces Earth Structures Metal Streams Water 12 5Ss Ss Examples Objectives Streams Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data Structures Collection; catalog; hypertext; document; metadata Specifies organizational aspects of the DL content Spaces Measure; measurable, topological, vector, probabilistic Defines logical and presentational views of several DL components Scenarios Searching, browsing, recommending Details the behavior of DL services Societies Service managers, learners, teachers, etc. Defines managers, responsible for running DL services; actors, that use those services; and relationships among 13 them Informal 5S & DL Definitions DLs are complex systems that • • • • • help satisfy info needs of users (societies) provide info services (scenarios) organize info in usable ways (structures) present info in usable ways (spaces) communicate info with users (streams) 14 Hypotheses • A formal theory for DLs can be built based on 5S. • The formalization can serve as a basis for modeling and building highquality DLs. 15 Research Questions 1. Can we formally elaborate 5S? 2. How can we use 5S to formally describe digital libraries? 3. What are the fundamental relationships among the Ss and high-level DL concepts? 4. How can we allow digital librarians to easily express those relationships? 5. Which are the fundamental quality properties of a DL? Can we use the formalized DL framework to characterize those properties? 6. Where in the life cycle of digital libraries can key aspects of quality be measured and how? 16 Book Parts • Ch. 1. Introduction (Motivation, Synopsis) • • • • Part 1 – The “Ss” Part 2 – Higher DL Constructs Part 3 – Advanced Topics Appendix 17 Book Parts and Chapters - 1 • Ch. 1. Introduction (Motivation, Synopsis) • Part 1 – The “Ss” – Ch. 2: Streams – Ch. 3: Structures – Ch. 4: Spaces – Ch. 5: Scenarios – Ch. 6: Societies 18 Book Parts and Chapters - 2 • Part 2 – Higher DL Constructs – Ch. 7: Collections – Ch. 8: Catalogs – Ch. 9: Repositories and Archives – Ch. 10: Services – Ch. 11: Systems – Ch. 12: Case Studies 19 Book Parts and Chapters - 3 • Part 3 – Advanced Topics – Ch. 13: Quality – Ch. 14: Integration – Ch. 15: How to build a digital library – Ch. 16: Research Challenges, Future Perspectives • Appendix – A: Mathematical preliminaries – B: Formal Definitions: Ss – C: Formal Definitions: DL terms, Minimal DL – D: Formal Definitions: Archeological DL – E: Glossary of terms, mappings 20 Outline • • • • • • • Acknowledgements Introduction: Life Cycle, Curric., 5S, Book ETANA-DL, 5S Description Theory and Automation Education: CS, ETDs Quality, Integration, and Automation Selected Links, Discussion 21 22 Initial ETANA-DL Member Locations Canadian University College Andrews University CWRU Walla Walla College Willamette University Virginia Tech Vanderbilt University Mississippi State University Map courtesy: www.enchantedlearning.com 23 24 25 Lahav Website 26 Megiddo Opening Screen 27 Locus Screen: Pictures View all 28 Area Screen 29 30 ETANA-DL Approach • Applying and extending Digital Library (DL) techniques to solve key problems: making primary data available, data preservation, and interoperability • Modeling archaeological information systems using 5S to better understand the domain and design the system and the supporting services • Rapidly prototyping DLs that handle heterogeneous archaeological data using componentized frameworks: – eliciting requirements – refining metamodel and union schema – modeling sites – mapping – harvesting 31 – providing useful services ETANA-DL Website 32 Marking – writing notes for a specific user Marking Items 33 Sender, Date, Object OAI ID Sender Comments Options: View Record, Add record to Items Of Interest, Re-mark item (Redirect), Unmark item (Remove item from list) Marked Items Display 34 Discussions about an object View/Post messages, create new threads Discussions Page 35 Items recommended on the basis of similar interests Recommendations 36 ETANA-DL Multi-dimensional Browsing 3 new sites 2 new types of artifacts 37 ETANA-DL Visual Browsing Service By site Visual Browse 38 Visual Browsing Nimrin: Topographical Drawings Square: N40/W20 Full site North west quadrant 39 Visual Browsing Nimrin : Square information Square: N40/W20 Locus: 86 Loci layout 40 Visual Browsing Nimrin : locus sheet 41 Visual Browsing Bab edh-Dhra' Cemetery Pottery # 25 42 Visual Browsing Bab edh-Dhra' Cemetery Pottery # 25 43 ETANA Societies 1. Historic and pre-historic societies (being studied) 2. Archaeologists (in academic institutes, fieldwork settings, or local and national governmental bodies) 3. Project directors 4. Technical staff (consisting of photographers, technical illustrators, and their assistants) 5. Field staff (responsible for the actual work of excavation) 6. Camp staff (e.g., camp managers, registrars, tool stewards) 7. General public (e.g., educators, learners, citizens) 44 ETANA Societies • Social issues 1. Who owns the finds? 2. Where should they be preserved? 3. What nationality and ethnicity do they represent? 4. Who has publication rights? 5. What interactions took place between those at the site studied, and others? What theories are proposed by whom about this? 45 ETANA Scenarios 1. 2. 3. 4. Life in the site in former times Digital recording: the planning stage and the excavation stage Planning stage: remote sensing, fieldwalking, field surveys, building surveys, consulting historical and other documentary sources, and managing the sites and monuments Excavation 1. 2. 3. 4. 5. 6. 7. 8. Detailed information is recorded, including for each layer of soil, and for features such as pole holes, pits, and ditches. Data about each artifact is recorded together with information about its exact find spot. Numerous environmental and other samples are taken for laboratory analysis, and the location and purpose of each is carefully recorded. Large numbers of photographs are taken, both general views of the progress of excavation and detailed shots showing the contexts of finds. Organization and storage of material Analysis and hypotheses generation and testing Publications, museum displays Information services for the general public 46 ETANA Spaces 1. Geographic distribution of found artifacts 2. Temporal dimension (as inferred by archaeologists) 3. Metric or vector spaces 1. used to support retrieval operations, and to calculate distance (and similarity) 2. used to browse / constrain searches spatially 4. 3D models of the past, used to reconstruct and visualize archaeological ruins 5. 2D interfaces for human-computer interaction 47 ETANA Structures 1. Site Organization 1. Region, site, partition, sub-partition, locus, … 2. Temporal orderings (ages, periods) 3. Taxonomies 1. for bones, seeds, building materials, … 4. Stratigraphic relationships 1. above, beneath, coexistent 48 ETANA Streams 1. successive photos and drawings of excavation sites, loci, unearthed artifacts 2. audio and video recordings of excavation activities and discussions 3. textual reports 4. 3D models used to reconstruct and visualize archaeological ruins. 49 Outline • • • • • • • Acknowledgements Introduction: Life Cycle, Curric., 5S, Book ETANA-DL, 5S Description Theory and Automation Education: CS, ETDs Quality, Integration, and Automation Selected Links, Discussion 50 5S and DL formal definitions and compositions (April 2004 TOIS) relation (d. 1) sequence graph (d. 6) (d. 3) measurable(d.12), measure(d.13), probability (d.14), language (d.5) vector (d.15), topological (d.16) spaces sequence tuple (d. 4)* (d. 3) function state (d. 18) event (d.10) (d. 2) 5S grammar (d. 7) streams (d.9) structures (d.10) spaces (d.18) scenarios (d.21) societies (d. 24) services (d.22) structured stream (d.29) digital object (d.30) structural metadata specification (d.25) transmission collection (d. 31) (d.23) repository (d. 33) descriptive metadata specification (d.26) metadata catalog (d.32) (d.34)indexing service hypertext (d.36) browsing service (d.37) digital library (minimal) (d. 38) searching service (d.35) 51 Streams image contains metadata specifications describes Collection Catalog text audio video contains Structures is_version_of/ cites/links_to describes digital object Index stores Measurable is_a Measure employs produces Topological Repository employs produces is_a is_a Vector Metric Probabilistic Spaces employs produces inherits_from/includes runs Service extends reuses Scenario precedes contains happens_before event Scenarios Societies Service Manager uses participates_in Actor recipient association operation executes 52 redefines invokes Infrastructure Services Repository-Building Creational Preservational Acquiring Cataloging Crawling (focused) Describing Digitizing Federating Harvesting Purchasing Submitting Conserving Converting Copying/Replicating Emulating Renewing Translating (format) Add Value Annotating Classifying Clustering Evaluating Extracting Indexing Measuring Publicizing Rating Reviewing (peer) Surveying Translating (language) Information Satisfaction Services Browsing Collaborating Customizing Filtering Providing access Recommending Requesting Searching Visualizing 53 Infrastructure Information Satisfaction Services Services (Add_Value) Rating Indexing p Training p {(digital object, Index actor, rate) } Society actor p handle anchor e classifier e Browsing e Requesting p p e e user model query/category e e Recommending p {digital object} e e Searching p Collection, {digital object} e Filtering Binding p p {digital object} query e binder e fundamental composite {digital object} transformer e e e Visualizing Expanding query p p space query’ 54 The XML Log Format Log Transaction SessionId MachineInfo Timestamp Event StatusInfo Search SearchBy SessionInfo RegisterInfo Timestamp Statement Action Browse QueryString Statement Update Collection Catalog StoreSysInfo Timeout PresentationInfo 55 5S Modeling -> Systems represented by Domain Concepts (theory) instance of interpreted as used to compose abstracted from Modeling Language (Meta-Model) instance of represented by DL Architecture Model interpreted as instance of instance of Running DL “real” world object Actors Q “Real” World 56 Tools/Applications 5S Meta Model DL Expert 5SGraph DL Designer Practitioner 5SL DL Model Teacher component pool ODLSearch, ODLBrowse, ODLRate, ODLReview, ……. Researcher 5SLGen Tailored DL Logging Module XML Log 57 A Minimal DL in the 5S Framework Streams Structured Stream Structures Spaces Structural Metadata Specification Scenarios Societies services Descriptive Metadata Specification indexing browsing searching hypertext Digital Object Collection Metadata Catalog Repository Minimal DL 58 A Minimal ArchDL in the 5S Framework Streams Structures Structured Stream Spaces Descriptive Metadata specification Scenarios Societies services SpaTemOrg StraDia Arch Descriptive Metadata specification ArchObj indexing browsing searching hypertext ArchDO Arch Metadata catalog ArchColl ArchDColl ArchDR Minimal ArchDL 59 Overview of 5SGraph Workspace (instance model) Structured toolbox (metamodel) 60 61 Outline • • • • • • • Acknowledgements Introduction: Life Cycle, Curric., 5S, Book ETANA-DL, 5S Description Theory and Automation Education: CS, ETDs Quality, Integration, and Automation Selected Links, Discussion 62 Computing and Information Technology Interactive Digital Educational Library (CITIDEL) • Domain: computing / information technology • Genre: one-stop-shopping for teachers & learners: courseware (CSTC, JERIC), leading DLs (ACM, IEEE-CS, DB&LP, CiteSeer), PlanetMath.org, NCSTRL (technical reports), … • Submission & Collection: sub/partner collections www.citidel.org 63 Digital library architecture for local and interoperable CITIDEL services EDUCATORS Multilingual Searching LEARNERS Browsing Union Metadata Filtering Filtering Profiles OAI Data Provider Annotating ADMINISTRATORS Revising Administering User Profiles Annotations PORTALS SERVICES REPOSITORIES OAI Data Harvester Remote and Peer Digital Libraries (eg. NSDL -CIS) 64 65 CITIDEL -> NSDL • A collection project in the • National STEM (science, technolgy, engineering, and mathematics) education Digital Library – NSDL • National Science Digital Library • www.nsdl.org • (Next slides courtesy Lee Zia, NSF) NSDL ProgramTracks • Core Integration: coordinate a distributed alliance of resource collection and service providers; and ensure reliable and extensible access to and usability of the resulting network of learning environments and resources • Collections: aggregate and actively manage a subset of the digital library’s content within a coherent theme / specialty • Services: increase the impact, reach, efficiency, and value of the digital library in its fully operational form • Targeted (Applied) Research: have immediate impact on one or more of the other three tracks • Pathways: large efforts across broad ranges of areas or approaches or users 67 68 69 NSDL Information Architecture Essentially as developed by the Technical Infrastructure Workgroup Portals & Portals & Clients Portals & Clients Clients User Interfaces Core NSDL “Bus” NSDL NSDL NSDL Collections Collections Collections Collection Building referenced referenced items&& Special items collections Databases collections Core Core Services: Collectionmetadata Building Core gathering CollectionServices protocols Building Services harvesting NSDL NSDL Services Other NSDL Services Services Usage Enhancement Core Services: CI Services information retrieval CI Services browsing CI Services authentication CI Services personalization CI Services discussion annotation 70 Digital Libraries in Education • • • • • • • Analytical Survey, ed. Leonid Kalinichenko © 2003, www.iite-unesco.org, info@iite.ru Transforming the Way to Learn DLs of Educational Resources & Services Integrated/Virtual Learning Environment Educational Metadata Current DLEs: US (NSDL, DLESE, CITIDEL, NDLTD), Europe (Scholnet, Cyclades), UK (Distributed National 71 Electronic Resource) A Digital Library Case Study • Domain: graduate education, research • Genre:ETDs=electronic theses & dissertations • Submission: http://etd.vt.edu • Collection: http://www.theses.org Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org Student Gets Committee Signatures and Submits ETD Signed Grad School Library Catalogs ETD, Access is Opened to the New Research WWW NDLTD 75 76 77 OCLC SRU Interface 78 79 ETD Union Search Mirror Site in China (CALIS) (http://ndltd.calis.edu.cn – popular site!) 80 81 Board of Directors • • • • • • • • • • • • • • Suzie Allard (ETD 2004, U. Kentucky) • Denise A. D. Bedford (World Bank) • Julia C. Blixrud (ARL, SPARC) • José Luis Borbinha (Natl Lib Portugal) • Alex Byrne (ETD 2005, ADT: Australia) • Tony Cargnelutti (ETD 2005, Australia) • Vinod Chachra (VTLS) • Susan Copeland (RGU, UK) • Jude Edminster (Bowling Green St. U.) • Scott Eldredge (Treasurer, ETD 2002, BYU) • Edward A. Fox (Exec Director,Virginia Tech) • John H. Hagen (West Virginia U.) Thomas B. Hickey (OCLC) Christine Jewell (U. Waterloo, Canada) • • Delphine Lewis (ProQuest) Joan K. Lippincott (CNI) Mike Looney (Adobe) Gail McMillan (Secretary, Virginia Tech) Joseph Moxley (ETD 2000, USF) Eva Müller (U. Uppsala, Sweden) Ana Pavani (PUC Rio, Brazil) Axel Plathe (UNESCO, Paris) Sharon Reeves (National Library Canada) Peter Schirmbacher (ETD 2003, Humboldt) Hussein Suleman (U.Cape Town, S. Africa) Shalini R. Urs (U. Mysore, India) Eric F. Van de Velde (ETD 2001, Caltech) 82 Selected Projects / Sponsors • • • • • • • • • Australia (ADT) Brazil (BDT, IBICT) Canada Catalunya Chile (Cybertesis) Germany India (Vidyanidhi) Korea OhioLINK: 79 colleges/univs • Portugal (National Library) • South Africa • UK (British Library, JISC, Edinburgh, …) • UNESCO (especially Latin America, Eastern Europe, Africa) • Venezuela 83 Why ETD? Short Answer • For Students: – Gain knowledge and skills for the Information Age – Richer communication (digital information, multimedia, …) • For Universities: – Easy way to enter the digital library field and benefit thereby • For the World: – Global digital library – large, useful, many services • General: – Save time and money – Increased visibility for all associated with research results 84 85 Outline • • • • • • • Acknowledgements Introduction: Life Cycle, Curric., 5S, Book ETANA-DL, 5S Description Theory and Automation Education: CS, ETDs Quality, Integration, and Automation Selected Links, Discussion 86 Describing Quality in Digital Libraries • What’s a “good” digital Library? – Central Concept: Quality! – Hypotheses of this work: • Formal theory can help to define “what’s a good digital library” by: • New formalizations of quality indicators for DLs within our 5S framework • Contextualizing these measures within the Information Life Cycle 87 Quality and the Information Life Cycle Active Accura cy Comple te Conform ness ance Timeliness Similarity Preservability Describing Organizing Indexing Authoring Modifying Semi-Active Pertinence Retention Significance Mining Creation Accessibility Storing Accessing Timeliness Filtering Utilization Archiving Distribution Seeking Discard Inactive Ac ce ssi bil Networking P r es i er v t y ab ilit y Searching Browsing Recommending Relevance 88 Formal Definition of DL Integration • DLi=(Ri, DMi, Servi, Soci), 1 i n – – – – • • • • Ri is a network accessible repository DMi is a set of metadata catalogs for all collections Servi is a set of services Soci is a society UnionRep UnionCat UnionServices UnionSociety 89 Formal Definition of DL Integration (Cont.) • DL integration problem definition: Given n individual libraries, integrate the n DLs to create a UnionDL. 90 Architecture of a Union DL DL1 Union DL DL2 Union Society Society archaeologists Service Searching Society Archaeologists General Public General Public Union Service Harvesting, Mapping, Searching, Browsing, Clustering, Visualization Service Browsing Catalog1 Union Catalog Catalog2 Repository1 Union Repository Repository2 91 Example of Union Service: CitiViz 92 Multidimensional Browsing: Percentages of Animal Bones Across Nimrin Cultural Phases 93 local schema global schema 94 Mapping recommendation 95 Requirements (1) 5S Meta Model DL Expert Analysis (2) DL Designer 5SGraph Practitioner 5SL DL Model component pool ODLSearch, ODLBrowse, ODLRate, ODLReview, ……. Teacher Design (3) Researcher Tailored DL Services 5SLGen Implementation (4) 5SSuite 5SGraph 5SGen Mapping Tool 96 ArchDL Expert 5S Archaeology MetaModel ArchDL Designer 5SGraph VN Metadata Format Scenario Sub-model ETANA-DL Union Services Descriptions ETANA-DL Metadata Format VN Catalog HD Catalog Mapping Tool Wrapper4VN Harvesting Mapping Searching Browsing … Wrapper4HD Inverted Files Search Service XOAI Browse DB Browse Service Component Pool Services DB 5SGen Other XOAI ETANA-DL Services Web Interface Union Catalog Browsing … HD Metadata Format 97 Outline • • • • • • • Acknowledgements Introduction: Life Cycle, Curric., 5S, Book ETANA-DL, 5S Description Theory and Automation Education: CS, ETDs Quality, Integration, and Automation Selected Links, Discussion 98 Selected Links - http://fox.cs.vt.edu • CITIDEL (computing education resources) – www.citidel.org • NCSTRL (computing technical reports) – www.ncstrl.org • NDLTD (electronic theses and dissertations worldwide) – www.ndltd.org and etdguide.org • NSDL (National Science Digital Library) – www.nsdl.org • OAI (Open Archives Initiative) – www.openarchives.org • Virginia Tech Digital Library Research Laboratory (DLRL, www.dlib.vt.edu) – 5S, AmericanSouth.Org, CSTC, DL-in-a-box, ENVISION, ETANA, MARIAN, NDLTD, NSDL, OAD, ODL, …) 99 Questions? Discussion? Thank You! 100