View/Open - NDLTD Document Archive

advertisement
From NDLTD to Technology
for Digital Libraries:
Progress and Challenges
IBM Tokyo Research Lab
July 27, 1999
Edward A. Fox
fox@vt.edu
CC CS DLRL Internet TIC
Virginia Tech, Blacksburg, VA, USA
Acknowledgements (Selected)
 Trip
Support: Kobe, NAIST, NEC, Ricoh, ULIS
 Sponsors: ACM, Adobe,
IBM, Microsoft, NSF,
OCLC, US Dept. of Education, …
 Co-PIs: Marc Abrams, Robert Akscyn, John
Eaton, Brian Kleiner, Gail McMillan
 Students:
Fernando Das Neves, Robert France,
Neill Kipp, Paul Mather, Constantinos
Phanouriou, James Powell, Ohm Sornil, David
Watkins, Chang Zhang, Jianxin Zhao
Virginia Tech Background
 Largest
university in Virginia, land-grant, town
population 35K plus 25K students
 Blacksburg Electronic Village, since 1992, with
80% of community on Internet
 Net.Work.Virginia, largest ATM network, with
over 600 sites, for education, research, govt
 LMDS, Local Multipoint Distribution Service,
gigabit wireless networking - 1/3 of Virginia
 Math Emporium, 500 workstations
 Faculty Development Initiative, round 2
Virginia Tech CS
 Department
of CS focussed on HCI
 $2M labs: usability, group decisions, info access
 Faculty (+ Abrams, Kafura, Shaffer, …)
–
–
–
–
–
–
–
Barfield (ISE - wearable)
Carroll (design, scenarios)
Ehrich (equipment, graphics)
Hartson (theory & methodology, remote evaluation)
Hix (usability, VR/CAVE)
Rosson (object orientation/languages, collaboration)
Williges (ISE - experimentation, meta-evaluation)
ACITC
 Advanced
Communications and Information
Technology Center, opening summer 2000
 Connects to the library, with a focus on IT
 1/3 high-tech (multimedia) classrooms
 1/3 digital/electronic library (reading room)
 1/3 research labs: 10, including:
–
–
–
–
Digital Library Research Laboratory (DLRL)
Center for Applied Technologies in the Humanities
HCI; HPC; Multimedia; Visualization (CAVE), …
Spaces for industry-supported labs, visitors
OUTLINE
 Challenges
to CS
 Opportunities for education
 5S framework
 NDLTD case study
 Technical progress
 DL as industrial success
Digital Libraries --- Objectives
 World
Lit.: 24hr / 7day / from desktop
 Integrated “super” information systems: 5S:
streams, structures, spaces, scenarios, societies
 Ubiquitous, Higher Quality, Lower Cost
 Education, Knowledge Sharing, Discovery
 Disintermediation -> Collaboration
 Universities Reclaim Property
 Interactive Courseware, Student Works
 Scalable, Sustainable, Usable, Useful
DLs: Why of Global Interest?
 National
projects can preserve antiquities and
heritage: cultural, historical, linguistic, scholarly
 Knowledge and information are essential to
economic and technological growth, education
 DL - a domain for international collaboration
–
–
–
–
wherein all can contribute and benefit
which leverages investment in networking
which provides useful content on Internet & WWW
which will tie nations and peoples together more
strongly and through deeper understanding
Why of Interest in Computing?
 Next
step in fields of DBMS, HT, IR, MM
 Efficiency requires advances in, e.g.,
– algorithms and data structures (ex., MPHF)
– networking (ex., HTTP-NG)
– OS (ex., support for streams)
 Effectiveness
requires advances in, e.g.,
– AI (ex., multilingual texts, user adaptation)
– HCI (ex., visualization, DLs embedded in activities)
 CS
Educ. can benefit; CS can aid Dist. Educ.
Digital Libraries --- Virginia Tech
 MARIAN
(NLM)
 CS DL Prototype - ENVISION (NSF, ACM)
 TULIP (Elsevier, OCLC)
 BEV History Base (NSF, Blacksburg)
 DL for CS Education - EI (NSF, ACM)
 WATERS, NCSTRL (NSF)
 NDLTD (SURA, US Dept. of Education)
 CSTC (NSF, ACM), CRIM (NSF, SIGMM)
 WCA (Log) Repository (W3C)
 VT-PetaPlex-1 (Knowledge Systems)
OUTLINE
 Challenges
to CS
 Opportunities for education
 5S framework
 NDLTD case study
 Technical progress
 DL as industrial success
DLs Shorten the Chain from
Editor
Reviewer
Publisher
A&I
Consolidator
Library
DLs Shorten the Chain to
Editor
Digital
Reviewer
A&I
Library
DLs --- Educational Implications
 Support
distance education
 Level the playing field by giving everyone
access to high quality resources
 Enrich learning by giving access to primary
rather than secondary materials/objects
 Integrate simulations, visualizations, and
multi-modal presentations to enhance learning
 Allow learn-at-your-own-pace, ensuring success
 Allow specialized interfaces, or embedded DL
for rich support of learning activities
How do universities and
digital libraries relate?
 Each
U. will have its own digital library. Hence
there will be large numbers (i.e., critical mass).
 All students will learn how to use and how to
“feed” digital libraries (and bring those habits
to future work as needs and skills).
 All digital library problems (esp. federation,
flexibility, personalization) appear at U’s (so
they are a good type of testbed, with willing
collaborators in-place for developing solutions).
SMETE Library
(from www.dlib.org)
 Context:
Global movement toward Digital
Libraries (see April 1998 CACM)
 NSF effort: Science, Mathematics,
Engineering, and Technology Education
Digital Library (focussed on undergraduates)
– 3 workshops, yearly increasing funds / new calls
 SMETE
Library likely to operate as distributed
federation, with separate parts for each key
discipline, and to lead to a global effort
Enhancing Learning with DLs
Enhancing
Learning
Digital
Libraries
Interactive
Experiences
Enhancing Learning with DLs
Enhancing
Learning
Digital
Libraries
Adding to
Digital
Library
(student)
Interactive
Experiences
Using
Digital Library
(direct)
(info literacy)
Other
Interactive
Learning
Activities
Authoring
(text, markup,
hypermedia,
cataloging-DC)
Discovering,
Browsing,
Searching,
Retrieving
Indirectly Using
Digital Library
(embedded,
by agent, ...)
Submitting
Work (ETD)
(Metadata,
PDF, XML)
Annotating,
Downloading,
Installing,
Feedback
Using DL
Contents (tools,
data sets, env's,
courseware, ...)
Preserving
(using stds,
migrating,
versioning)
5S Framework:
Societies,Scenarios,
Streams,Spaces,
Structures
Collaboration
(in/around DL
and its artifacts distance educ.)
Enhancing Learning with DLs
Enhancing
Learning
Digital
Libraries
Student
Portfolios
NDLTD
Networked DL
of Theses &
Dissertations
60 members
US Dept. Ed.
Australia
Germany
Interfaces
2D, 3D,
CAVE,
IBM,OCLC,...
Interactive
Experiences
Computer
Science
Other
Projects
ACM
Digital
Library
www.acm.org
"Education
Innovation"
NSF - VT
45 courses
Material
Sci & Eng
(TULIP Elsevier + 9U's)
NCSTRL
Technical
Reference
Library
CSTC
CS
Teaching
Center
NARA
new project
(SAIC and
a team ...)
CRIM
Curriculum
Resources
Inter. MM
ENVISION
NSF - VT
results
visualization
NSF Education Innovation (EI)
 NSF
“Interactive Learning with a Digital
Library in Computer Science” (1993-98)
 45 online courses (esp. Internet, IR, MM,
Professionalism, overall EI project pages):
100+K accesses/wk
 Tools: SWAN (visualization), QUIZIT
 Evaluation
– traditional
– network logging and analysis
– tools for visualization
Digital Library Courseware
 http://ei.cs.vt.edu/~dlib/
 WWW
pages or large PDF copy files
 Online quizzes based on book by Michael Lesk
(Morgan Kaufmann Publishers)
 Contents based on book, with several other
popular topics added (e.g., agents)
 Separate pages to supplement: Definitions,
Resources (People, Projects), and References
OUTLINE
 Challenges
to CS
 Opportunities for education
– CSTC, CRIM
 5S
framework
 NDLTD case study
 Technical progress
 DL as industrial success
CS -> CSTC -> CRIM
 NSF
and ACM Education Committee are funding
a 2 year project “A Computer Science Teaching
Center” - CSTC - http://www.cstc.org/
 College of NJ, U. Ill. Springfield, Virginia Tech
 Focus initially on labs, visualization, multimedia
 Multimedia part is also supported by “Curriculum
Resources in Interactive Multimedia” - CRIM grant to Virginia Tech and George Washington
University: http://www.cstc.org/~crim/
CS Teaching Center (CSTC)
 Instead
of building large, expensive multimedia
packages, that become obsolete and are difficult
to re-use, concentrate on small knowledge units.
 Learners
benefit from having well-crafted
modules that have been reviewed and tested.
 Use
digital libraries to build a powerful base of
support for learners, upon which a variety of
courses, self-study tutorials & reference resources
can be built. (See NSF SMETE-Lib Study at
http://www.dlib.org/smete/public/smete-public.html)
CRIM Rationale
 MM
field needs properly trained personnel
 Support this with resources + curricula
 Together these help us move toward a DL
for Interactive MM -> CS -> SMETE
 Benefits will go to teachers (who have more
to build upon) and students (who will have
a richer environment for learning
Concerns, Problems
 Motivating
educators to create modules that can
be used elsewhere is difficult without a suitable
reward structure and an infrastructure of testing,
packaging, discovery, reuse, and evaluation.
 There
is a unnecessary disconnect between
researchers (e.g., in laboratories) preparing
exciting demonstrations for conferences and
instructors interesting in helping students grasp
underlying concepts and innovations in their
area.
Solutions, Plans
 CSTC
will have a variety of focused centers so
that different types of resources can be
collected, tested, and suitably packaged:
– laboratory exercises, activities, assignments
– visualizations and visualization tools
– interactive multimedia resources (CRIM)
 ACM
may launch a digital library “Transactions
in Courseware and Education in Computing” to
provide an ongoing infrastructure for CSTC.
CRIM Project Activities
 Workshops,
other ways to involve community
 WWW site including DL in CSTC re MM
– Devised cataloging schema, designed interface
– Referring to all MM syllabi and curriculum
– Inviting learning resources for the CRIM DL, with
reviews, reuse certifications
 Publish
report on MM curriculum through
ACM and IEEE, after careful review
Dimensions / Categories (matrix+)








Level: K-12, ugrad (low, upper), grad (MS, PhD), prof.
Length: reference, short course, course unit, course, …
Academic orientation: science, engineering, art, communications,
multidisciplinary, marketing
Pedagogical orientation: mm use, survey/hands-on,
traditional/constructivist (design, develop)
Tool connection: course on <Photoshop>, compare animation
tools, assume know or can learn, use tool as example
(Target) audience (and background): culture
Learning style: visual/auditory, indiv/group
Relation: closed/complete, part of some structure (e.g., course,
program)
Virginia Tech Courses
 Art:
Digital Art and Design course (Photoshop)
 CS: 1604 Introduction to the Internet (1 cr.)
 CS: 3604 Professionalism in Computing
 CS: 4624 Multimedia, Hypertext and Information
Access (3 cr.)
 CS: 5604 Information Storage & Retrieval (3 cr.)
 CS: 6604 Digital Libraries (3 cr.)
CS4624
 Units
–
–
–
–
–
Applications&Authoring
Capture&Representation
Compression&Models
Presentation&Interaction
Communication&Networking
 Pedagogy
– field trips
– readings & quizzes; exercises in lab; final
– “real” term project in groups
OUTLINE
 Challenges
to CS
 Opportunities for education
 5S framework
 NDLTD case study
 Technical progress
 DL as industrial success
How to Build a Digital Library
 Understand
the problem (using the 5S
Framework)
 Solve
the problem (using the Star
Methodology)
– design, develop, evaluate,
– refine, operate
Definition: Digital Libraries
are complex systems that
 help
satisfy info needs of users (societies)
 provide info services (scenarios)
 organize info in usable ways (structures)
 present info in usable ways (spaces)
 communicate info with users (streams)
5S Layers
Societies
Scenarios
Spaces
Structures
Streams
Definition: 5S Framework
 Societies:
interacting people (, computers)
 Scenarios: services, functions, operations,
methods
 Spaces: domains + constraints (e.g., distance,
adjacency): 2D, vector, probability
 Structures: relations, trees, nodes and arcs
 Streams: sequences of items (text, audio,
video, network traffic)

(5 Element System: Fire, Wood, Earth, Metal, Water)
5S: Components
 Societies:
roles, rituals, reasons, relationships,
artifacts
 Scenarios: acquire, index, consult, administer,
preserve
 Spaces: physical, temporal, functional,
presentational, conceptual
 Structures: architectures, taxonomies, schema,
grammars, links, objects
 Streams: granularities, protocols, paths, flows,
turbulences
Star Methodology
Neill Kipp Dissertation
 Training
interested groups about 5S and the Star
Methodology, refining the Framework to have
solid mathematical foundation
 Case studies of projects at Virginia Tech or
involving VT staff/students: CSTC, NDLTD,
NARA (with SAIC), Lexis, ...
 Open also to study DL projects elsewhere
 Focusing too on the design artifacts developed
and related issues of efficient description and
representation (esp. with markup, hypermedia)
OUTLINE
 Challenges
to CS
 Opportunities for education
 5S framework
 NDLTD case study
 Technical progress
 DL as industrial success
A Digital Library Case Study
Domain:
graduate
education, research
Genre:ETDs=electronic
theses & dissertations
Submission:
http://etd.vt.edu
Collection:
http://www.theses.org
Project:
Networked Digital
Library of Theses &
Dissertations
(NDLTD) http://
www.ndltd.org
ETDs Got Your Interest?
ETD Web Site
http://www.ndltd.org/
Graduate Students
U. Laval
Media
Singapore AM
Chronicle of Higher Ed.
National Public Radio
NY Times ...
Key Ideas:
Scalability
Networked infrastructure
University collaboration
Workflow, automation
Maximal access
Education is the rationale
8th graders vs. grads
Authors must submit
Standards
PDF, SGML, MM
MARC, DC, URNs
Federated search
Status of the Local Project
 Approved
by university governance
Spring 1996; required starting 1/1/97
 Submission & access software in place
 Submission workshops for students
(and faculty) occur often: beginner/adv.
 Faculty training as part of Faculty
Development Initiative
 Over 2000 ETDs in collection
What are we doing?
 Aiding
universities to enhance grad educ.,
publishing and IPR efforts
 Helping improve the availability and
content of theses and dissertations
 Educating ALL future scholars so they can
publish electronically and effectively use
digital libraries (i.e., are Information
Literate and can be more expressive)
What are the long term goals?
 400K
US students / year getting grad
degrees are exposed / involved
 200K/yr rich hypermedia ETDs that
may turn into electronic portfolios
 Dramatic increase in knowledge
sharing: lit. reviews, bibliographies, …
 Services providing lifelong access for
students: browse, search, prior
searches, citation links
Student Prepares Thesis or Dissertation
NDLTD
Literature
Computer Resources
Research
Student Defends and Finalizes ETD
My Thesis
ETD
Student Gets Committee Signatures
and Submits ETD
Signed
Grad School
Graduate School Approves ETD
Student is Graduated
Ph.D.
Library Catalogs ETD and New Students
Have Access to the New Research
WWW
NDLTD
Institutional Members
Coalition
for Networked Information
(CNI)
Committee on Inst. Coop. (CIC)
Diplomica.com
Dissertation.com
National Library of Portugal
UNESCO
US University Members
Air
University (Alabama)
Cal Tech
Clemson University
College of William & Mary
Concordia University (Illinois)
East Tenn. State University
Florida Institute of Tech.
Florida International University
Michigan Tech
Naval Postgraduate School (CA)
North Carolina State U.
Penn. State University
Rochester Institute of Tech.
U. of Florida
U. of Georgia
University of Hawaii, Manoa













U. of Iowa
U. of Maine
U. of Oklahoma
U. of South Florida
U. of Tennessee, Knoxville
U. of Tennessee, Memphis
U. of Texas at Austin
U. of Virginia
U. Wisconsin - Madison
Vanderbilt U.
Virginia Tech - required since 1/97
West Virginia U. - required
beginning fall 1998
Worcester Polytechnic Inst.
Australian Project Members
U.
New South Wales (lead institution)
U. of Melbourne
U. of Queensland
U. of Sydney
Australian National University
Curtin U. of Technology
Griffith U.
German Project Members
Humboldt
University (lead institution)
3
other universities
5
learned societies
1
computing center
2
major libraries
Other International Members
Chinese
University of Hong Kong
Chungnam National U., Dept of CS (S. Korea)
City University, London (UK)
Darmstadt U. of Tech. (Germany)
Free University of Berlin (Germany - Vet. Med.)
Gyeongsang National U. (Korea)
India Institute of Technology, Bombay (India)
Nanyang Technological U. (Singapore, part)
National U. of Singapore (Singapore, part)
*National Library of Portugal
Polytechnic University of Valencia (Spain)
Rhodes U. (South Africa)
St. Petersburg St. Tech.U (Russia)
Univ. de las Américas Puebla (Mexico)
U. Laval; U. of Guelph; U. Waterloo; Wilfrid Laurier U. (Canada)
NUDL
 1/15/99
NUDL proposal to NSF under DLI2
international program
– VT: Library, Grad School, Industrial&Systems Eng.
– Partners: UK (2) , Singapore, Russia, Korea, Greece,
Germany, plus Iberoamerican group (Spain,
Portugal, Argentina, Brazil, Chile, Mexico)
– Problems: Multilingual search, multimedia
submissions, requirements/usability, …
 Start
with ETDs, then expand to other student
works, portfolios, data sets, (CS) courseware, ...
National Coverage (red/white)
NUDL Partners













Ricardo A. Baeza-Yates, Universidad de Chile, Chile
José Luis Brinquete Borbinha, Biblioteca Nacional, Portugal
José Hilario Canós Cerdá, Universidad Politécnica de Valencia, Spain
Stavros Christodoulakis, Technical University of Crete, Greece
Lautaro Guerra Genskowsky, Universidad Técnica Federico Santa Maria,Chile
Juan José Goldschtein, Univesidad de Belgrano, Argentina
Peter Diepold, Humboldt University, Germany
Francisco Javier Jaén Martinez, Spain
Sung Hyon Myaeng, Chungnam National University, Korea
Ana Maria Beltran Pavani, Prédio Cardeal Leme, Brazil
Lim Ee Peng, Nanyang Technological University, Singapore
Alexander I. Plemnek, St.-Petersburg State Technical University, Russia
J. Alfredo Sánchez, Universidad de las Américas-Puebla, Mexico
Access Statistics
1996
Total successful requests: 37,171
Av. successful requests/day: 102
Requests for .PDF files:
4,600
Requests for .HTML file
28,225
Distinct hosts served
9,015
Total data transferred:
3,229M
Av. data transferred/day:
9M
1997
247,573
685
72, 854
129,831
22,725
25,953M
73M
1998
628,401
1,690
343,236
215,896
36,724
74,051M
222M
Popular Works 1996
458 Seevers, Gary L. Identification of Criteria for Delivery of Theological Education Through
Distance Education: An International Delphi Study (Ph.D., Educational Research and
Evaluation, April 1993; 1353Kb)
432 Hohauser, Robyn Lisa. The Social Construction of Technology: The Case of LSD (MS in
Science and Technology Studies, Feb. 1995; 244Kb)
390 Childress, Vincent William. The Effects of Technology Education, Science, and
Mathematics Integration Upon Eighth Grader's Technological Problem-Solving Ability (Ph.D.
in Vocational and Technical Education, July 1994; 285Kb)
310 Kuhn, William B. Design of Integrated, Low Power, Radio Receivers in BiCMOS
Technologies (Ph.D. in Electrical Engineering, Dec. 1995; 2Mb)
287 Sprague, Milo D. A High Performance DSP Based System Architecture for Motor Drive
Control ( MS in Electrical Engineering, May 1993; 878Kb)
165 Wallace, Richard A. Regional Differences in the Treatment of Karl Marx by the Founders
of American Academic Sociology (MS in Sociology, Nov. 1993; 479Kb)
150 McKeel, Scott Andrew. Numerical Simulation of the Transition Region in Hypersonic
Flow (Ph.D. in Aerospace Engineering, Feb. 1996; 3Mb)
Popular Works 1997
9920 Liu, Xiangdong. Analysis and Reduction of Moire Patterns in Scanned Halftone Pictures
(Ph.D. in Computer Science, May 1996; 6.6Mb)
7656 Petrus, Paul. Novel Adaptive Array Algorithms and Their Impact on Cellular System
Capacity (Ph.D. in Electrical Engineering, March 1997; 5Mb)
2781 Agnes, Gregory Stephen. Performance of Nonlinear Mechanical, Resonant-Shunted
Piezoelectric, and Electronic Vibration Absorbers for Multi-Degree-of-Freedom Structures
(Ph.D. in Engineering Mechanics, Sept. 1997; ? + 7926Kb)
2492 Gonzalez, Reinaldo J. Raman, Infrared, X-ray, and EELS Studies of Nanophase Titania
(Ph.D. in Physics, July 1996; 4607Kb)
1877 Shih, Po-Jen. On-Line Consolidation of Thermoplastic Composites (Ph.D. in Engineering
Mechanics, Feb. 1997; 3.3Mb)
1791 Saldanha, Kevin J. Performance Evaluation of DECT in Different Radio Environments
(MS in Electrical Engineering, Aug. 1996; 3.2Mb)
1431 DeVaux, David. A Tutorial on Authorware (MS in CS, April 1996; 2.3Mb)
1394 Kuhn, William B. Design of Integrated, Low Power, Radio Receivers in BiCMOS
Technologies (Ph.D. in Electrical Engineering, Dec. 1995; 2518Kb)
Popular Works 1998
 K-accesses
Mbytes Degree Year Dept
Tables/Figures Author
 75, 12, PhD, 1997, ME, 38/174, Maillard
 56, 6.5, PhD, 1996, CS, 8/93, Liu
 20, 3.9, PhD, 1997, EE, 9/121, Laster
 15, 4.9, PhD, 1997, CpE, 17/127, Tripathi
 12, 6.6, MS, 1997, EE, 7/96, Nicoloso
 6.7, 4.6, PhD, 1996, Physics, 8/62 (32
color), Gonzalez
International Use
 1996
 850
 608
 346
 713
 387
 463
 250
 191
 183


22
83
1997
2992
2,501
2378
2367
1264
1161
725
867
1130
967
958
1998
8170 United Kingdom
4223 Australia
7373 Germany
3970 Canada
2201 South Korea
4431 France
2553 Italy
2781 Netherlands
1449 Brazil
1089 Thailand
1414 Greece
Who are sponsors / cooperators?

Funding, Donations of hardware/software
–
–
–
–
–
–

SURA
US Dept. of Education (FIPSE)
Adobe Systems
IBM
Microsoft
OCLC
Others Serving on Steering Committee
– National/Regional Projects: Australia, French
speaking group, Germany, IberoAmerica
(ISTEC), UK (UTOG)
– CGS, National Lib. Canada, NSF, OAS,
SOLINET, UMI, UNESCO, ...
How can a university get
involved?
 Select
–
–
–
–
planning/implementation team
Graduate School
Library
Computing / Information Technology
Institutional Research / Educ. Tech.
 Send
us letter, give us contact names
 Adapt Virginia Tech solution
–
–
Build interest and consensus
Start trial / allow optional submission
Build Local ETD Site
ETD
Workshop/Training
Digital Library
Policies
Inspection/Approval
Type 1 Members
University Requires ETDs
 Adobe Acrobat
and/or XML/SGML tools
 Automated submission & processing
 Archive/access through UMI, (OCLC,)
Virginia Tech, ...
 (Local) WWW site, publicity
 (Local) Assistance provided as requested:
email, phone, listserv(s)
Type 2 Members
University Agrees to Require ETDs
 Like
Type 1 but set date not reached
 Usually has an option or pilot
 May: wait for new AY; start with all who
enter after; …
 Build grass roots support
–
–
–
–
Advisory committee: representative? expert?
Champions to spread by word of mouth
Approval: Senates, Commissions, Deans, Students
Publicity to reach community
NDLTD Members, Types 3-7
3.
Part of university requires ETDs
4. University allows ETDs
5. University investigating, has pilot
6. University consortium joins:
– CIC (Big 10 coordinating body)
7.
Non-university organization joins
– CNI (Coalition for Networked Info.)
Everyone Learns
Students
become “info literate”
Students learn about discovery, search,
categorization/classification, e-pub,
preservation, helping others find/reuse
Campus starts to think about IPR
– e.g., Virginia Tech symposium
Faculty
and students improve quality as
reader base expands
OUTLINE
 Challenges
to CS
 Opportunities for education
 5S framework
 NDLTD case study
 Technical progress
– NDLTD services
 DL as
industrial success
User Search Support
(multilingual, XML)
NDLTD World Federated
Search
User
Interface
Virginia Tech ...
(univ)
UMI ...
(corporate)
CIC ...
(univ group)
Portugese NL ...
(national lib)
Australia
(regional)
Note: All groups shown are connected with NDLTD.
www.theses.org
 James
Powell student project, D-Lib
Magazine description in Sept. 1998
 XML description of each site
– type of search engine / service
– language
– coverage (for resource discovery)
 Adding
Z39.50 gateway capability
Interoperability Testing
 IBM
DL: donated equipment, technical
support, powerful IPR (see TOIS, D-Lib)
 Z39.50: OCLC SiteSearch / VT tailored s/w
– university libraries w. catalogs of freely shared
MARC records pointing to archival copies
– via URNs: handles & PURLs
 Dienst
/ NCSTRL - www.ncstrl.org: CS depts.,
DARPA, NSF, CNRI, Cornell - UVA is
working on extensions for ETDs - Portugal is
studying use for Europe - VT is working on
Dienst to Z39.50 gateway
Access Approaches
 Goal:
Maximize access and services, e.g.,
by encouraging:
 UMI centralized services
 Distributed service: Dienst, Z39.50
 Regional services (e.g., OhioLink, AZ/NM)
 Local servers with browse, search
From local catalogs to local archives
 WWW robot indexing and search services
–
Support Services Developed
 WWW
site with > 300 Mb, CD, videotape
 Automated submission system (MySQL,
UNIX, WWW scripts - grad school/library)
 Student guidelines, style sheets, multimedia
training materials, FAQs, press info
 SGML and XML DTDs for ETDs
 SGML to HTML (web generator)
 LaTeX, Word templates, converters
Support Offered
 Software,
documentation, tech support
 Email, listservs (etd-l@listserv.vt.edu,
eval, -grad, -library, -technical)
 Donations: Adobe, Microsoft
 Evaluation: instruments, analysis
http://scholar.lib.vt.edu - solutions/statistics
 (Temporary storage / archiving; aid - in
setting up an int’l service & archive)
Enhancements
 Dublin
Core spec, MARC crosswalk
 DTDs for SGML, XML(+ <discipline>ML)
 Annotation system (author, friends, notes)
 Routing system (based on Sift)
 Multilingual WWW site, training materials
(Spanish recently done in Valencia)
 Better federated search (w. Z39.50, planned
with Dienst and Harvest - maybe MARIAN)
Further Services
 Adding
services currently prototyped
– support with IBM DL, OCLC SiteSearch
 Adding
other services planned
– building and using citation database (w. SFX)
– implementing plagiarism check (like “SCAM”)
 Developing
NUDL as a sustainable self
governing global institution (w. committees)
Other Work
 Working
with publishers to increase level of
access as much as possible
 Interoperability tests among universities and
with UMI to provide integrated services
 Study with testbed that emerges, to improve
information retrieval, browsing, interface,
and other types of user support
 Evaluation, improving learning experience,
spread to worldwide initiative, sustainable
support and coordination
OUTLINE
 Challenges
to CS
 Opportunities for education
 5S framework
 NDLTD case study
 Technical progress
– Networking
 DL as
industrial success
Network Research Group
 NSF
3 year grant on WWW logging,
characterization, and optimization: Abrams,
Fox, Pollard (CNS)
 Core
member of Web Characterization
Activity of World-Wide Web Consortium
 Providing
DL (with OCLC) to support WCA
(at http://www.cs.vt.edu/repository/):
– logs
– tools
– publications
NRG Tools
WebJamma:
Artificial HTTP traffic generator
WebWatcher:
HTTP traffic monitoring and
logging system
CLFmunge:
Anonymizes common log format
HTTPdump:
Protocol decode for tcpdump
Caching proxy simulator
Splus programs
Log description and validation interface & routines
DL Submission Software
 Similar
software developed for WCA, CSTC,
and NDLTD
 CSTC
version field-tested to manage papers
for ACM Digital Libraries ‘99
 May
generalize for
– conferences
– electronic journal
– resource description (e.g., courses, Web content)
Dissertations
 Abdulla
(completed)
– collected diversity of Web logs
– analyzed EI logs re educational use
– Fourier analysis, self similarity
 Sulleman
(starting)
– WCA implementation
– dynamic documents
patterns,
regularities
classes, templates, OIDs, variable data
OUTLINE
 Challenges
to CS
 Opportunities for education
 5S framework
 NDLTD case study
 Technical progress
– Interfaces
 DL as
industrial success
Accessibility Activities / Plans
 Interface
design (simple, 3D, VR)
 Usability studies
 Generic multi-lingual support
 Support for those with disabilities
 Hybrid collection (paper, MARC,
abstracts, full-text, multimedia)
 Disciplinary classifications, tools
 Visualization of results, collection
SPIRE Visualization
CAVE Experiments
 Use
a familiar metaphor
– building / floor / room / shelf / book
 Rearrange
orderings / shelving
– use categories, clustering, ranking
– use visualization: colors and gaps
– study space mappings: physical, logical
 Simplify
movement for key tasks
ENVISION
 NSF
“A User-Centered Database from the
Computer Science Literature” (1991-93)
 Collected bib/typesetter data, converted to SGML
 Scanned thousands of page images
 MARIAN search engine - can be made available
(also applied to the Virginia Tech library catalog)
used as part of a prototype object-based DL, with
tailored visualization interface (L. Nowell
dissertation)
Envision Results Window
OUTLINE
 Challenges
to CS
 Opportunities for education
 5S framework
 NDLTD case study
 Technical progress
– MARIAN
 DL as
industrial success
MARIAN
 Multiple Access
Retrieval of
Information with ANnotations
 (Musical: Marian the Librarian …)
 Evolved from 1980’s CODER system to
a distributed Online Public Access
Catalog (OPAC), then DL backend,
now becoming a full DL system
 From C/C++ to Java by Jianxin Zhao
 Future uses: NDLTD, NUDL, PetaPlex
MARIAN Layers
User
User
User
User Interface Layer
User Information Layer
Search Engine Layer
Database Layer
User
MARIAN Testing Architecture
Load
Generator
Webgate
Java
Server
C/C++
Server
MARIAN Parallelism
response time
(ms)
Java part response time vs. query rate comparation
(type 1 requests)
4000
3000
2000
1000
0
0
100
200
300
queryrate (#/min)
all modules in one machine
one "webgate"
two "webgate"s
four "webgate"s
400
500
MARIAN Response Time
time delay (ms)
Four "webgate"s, decomposed time delay vs. query
rate
4000
3000
2000
1000
0
0
100
200
300
query rate (#/min)
system
after Java server
400
500
France Dissertation
 Key
developer since CODER
 Applying computational linguistics
efforts with machine readable
dictionaries
 Applying opportunistic handling of
term lists for ranking, usable displays
(“to be or not to be, that is the”)
 Developing and evaluating variety of
interfaces
OUTLINE
 Challenges
to CS
 Opportunities for education
 5S framework
 NDLTD case study
 Technical progress
– PetaPlex
 DL as
industrial success
PetaPlex
 Digital
Library Machine (“super” object store)
 Parallel
computer / storage utility for scale of 1000
to 100,000,000 gigabytes (1 Tbyte - 100 Pbyte)
 Knowledge
Systems Incorporated is supplying
VT-PetaPlex-1 for $250,000 with
– high speed backbone connection (OC-12)
– 2.5 terabytes through 100 “Nanoservers”:
– Each = Network connection + IBM 25GB disk +
233 MHz Pentium II + Linux
PetaPlex Approach
 Extend
work on KMS from 1970s
 Achieve qualitative improvement in quality of
hypertext
– sub-second response
– with terabyte and petabyte scale stores
 Do
everything with one seek - through hashing
over a very large storage space
– support URN access as primitive service
– support name / repository model for digital library
Service
Machine 1
Service
PetaPlex Complex
Service
Machine 2
Nanoserver
FRONT END MACHINE
RS/6000, 1G RAM, 4 Proc.
Machine 3
Service
Machine 4
Nanoserver
Nanoserver Nanoserver Nanoserver Nanoserver
Nanoserver Nanoserver Nanoserver Nanoserver
Nanoserver Nanoserver Nanoserver Nanoserver
PetaPlex Service Machines
 Small
object server
 Large object server
– video on demand
– streaming audio
 Information
retrieval server
 Proxy / cache server (e.g., 1 terabyte server
of 1000 worldwide for Comsat/Intelsat)
PetaPlex Top View
4 ft.
side
PetaPlex Side View
many
shelves /
Roles:
* Support
* Cooling
* Power
side
8 ft.
high
4 ft. wide
PetaPlex Cost Goals, Approach
 Maximize
number of seeks achievable
 Maximize % of cost invested in disks
 Maximize flexibility and reliability
 Minimize cost per unit of storage
 Approach
“information utility”
 Increase throughput and reliability by
replicating on other PetaPlex systems
 Use robotics, wireless, and commodity
production of nanoservers
Sornil & Mather Dissertations
 Proposing
50 Tbyte wireless Petaplex for $2M
 Mather: efficiently handling very large numbers
of objects of varying sizes
 Sornil: efficiently handling IR for very large
collections, large numbers of users, high
transaction rates, large inverted files
– modeling and simulation
– data organization
– parallelization of algorithms, alone and in
combination for retrieval (related) tasks
OUTLINE
 Challenges
to CS
 Opportunities for education
 5S framework
 NDLTD case study
 Technical progress
 DL as industrial success
DL Challenges
Preservation
- so people with trust DLs
Affordable
storage - so DLs will be
universally used
DL industry
- critical mass by covering
libraries, archives, museums, corporate
info, govt info, personal info - “quality
WWW” integrating IR, HT, MM, ...
DLs: Broad Impact
 DLs
should be in companies
 DLs should be in government (integrated)
 DLs should be built for all data generated - all
types of data, information, and knowledge
– covering content/knowledge management
– covering data mining, IR, discovery, visualization
– promoting specialized work on all types of
collections for all types of user groups
 5S
framework and university-industry
collaboration may help move us to these goals!
Download