View/Open

advertisement
Involving New Scholars in
Digital Libraries through the
Networked Digital Library of
Theses and Dissertations
(NDLTD)
Telcordia, Morristown, NJ
December 30, 1999
Edward A. Fox
fox@vt.edu
CC CS DLRL Internet TIC
Virginia Tech, Blacksburg, VA, USA
Acknowledgements (Selected)

Sponsors: ACM, Adobe, IBM, Microsoft, NSF, OCLC,
US Dept. of Education, …

Co-PIs: Marc Abrams, Robert Akscyn, John Carroll,
John Eaton, Gail McMillan

Students: Fernando Das Neves, Robert France, Marcos
Goncalves, Neill Kipp, Paul Mather, Constantinos
Phanouriou, James Powell, Ohm Sornil, David
Watkins, Chang Zhang, Jianxin Zhao, …

Information Systems at Virginia Tech – see online full
version of short introduction to networking efforts, at
http://rdweb.cns.vt.edu/talks/VT-Initiatives-9-99.ppt
(by Jeff Crowder and Erv Blythe)
OUTLINE
Introduction
Digital
libraries
NDLTD case study
Members, statistics
Relationships, universities
Access, software, hardware
Conclusion
Virginia Tech Background
 Largest
university in Virginia, land-grant, town
population 35K plus 25K students, #2 in football
 Blacksburg Electronic Village, since 1992, with
80% of community on Internet
 Net.Work.Virginia, largest ATM network, with
over 600 sites, for education, research, govt
 LMDS, Local Multipoint Distribution Service,
gigabit wireless networking - 1/3 of Virginia
 Math Emporium, 500 workstations
 Faculty Development Initiative, round 2
Virginia Tech CS



Department of CS focused on HCI since 1994
$2M (NSF RI) labs: usability, group decisions, info access
Faculty (+ Kafura – OO/real-time, Head)
– Abrams (Network Research Group, UIML – user interface)
– Barfield (ISE - wearable)
– Bowman (virtual environments and interface issues)
– Carroll (design, scenarios, education, BEV)
– Ehrich (equipment, graphics, BEV)
– Hartson (theory & methodology, remote evaluation)
– Hix (usability, VR/CAVE)
– Ramakrishnan (data mining, recommemder systems)
– Rosson (object orientation/languages, collaboration)
– Shaffer (problem solving environments, education, GIS)
– Williges (ISE - experimentation, meta-evaluation)
ACITC
 Advanced
Communications and Information
Technology Center, opening summer 2000
 Connects to the library, with a focus on IT
 1/3 high-tech (multimedia) classrooms
 1/3 digital/electronic library (reading room)
 1/3 research labs: 10, including:
–
–
–
–
Digital Library Research Laboratory (DLRL)
Center for Applied Technologies in the Humanities
HCI; HPC; Multimedia; Visualization (CAVE), …
Spaces for industry-supported labs, visitors
OUTLINE
Introduction
Digital
libraries
NDLTD case study
Members, statistics
Relationships, universities
Access, software, hardware
Conclusion
Definition: Digital Libraries
are complex systems that
 help
satisfy info needs of users (societies)
 provide info services (scenarios)
 organize info in usable ways (structures)
 present info in usable ways (spaces)
 communicate info with users (streams)
5S Layers
Societies
Scenarios
Spaces
Structures
Streams
Definition: 5S Framework
 Societies:
interacting people (, computers)
 Scenarios: services, functions, operations,
methods
 Spaces: domains + constraints (e.g., distance,
adjacency): 2D, vector, probability
 Structures: relations, trees, nodes and arcs
 Streams: sequences of items (text, audio,
video, network traffic)

(5 Element System: Fire, Wood, Earth, Metal, Water)
5S: Components
 Societies:
roles, rituals, reasons, relationships,
artifacts
 Scenarios: acquire, index, consult, administer,
preserve
 Spaces: physical, temporal, functional,
presentational, conceptual
 Structures: architectures, taxonomies, schema,
grammars, links, objects
 Streams: granularities, protocols, paths, flows,
turbulences
Neill Kipp Dissertation
Training interested groups about 5S and the Star
Methodology, refining the Framework to have solid
mathematical foundation
 Case studies of projects at Virginia Tech or involving
VT staff/students: CSTC, NDLTD, NARA (National
Archives, with SAIC), Lexis, ...
 Open also to study DL projects elsewhere
 Focusing too on the design artifacts developed and
related issues of efficient description and representation
(esp. with markup, hypermedia)

Digital Libraries --- Virginia Tech
 MARIAN
(NLM)
 CS DL Prototype - ENVISION (NSF, ACM)
 TULIP (Elsevier, OCLC)
 BEV History Base (NSF, Blacksburg)
 DL for CS Education - EI (NSF, ACM)
 WATERS, NCSTRL (NSF)
 NDLTD (SURA, US Dept. of Education)
 CSTC (NSF, ACM), CRIM (NSF, SIGMM)
 WCA (Log) Repository (W3C)
 VT-PetaPlex-1 (Knowledge Systems)
Digital Library Courseware
 http://ei.cs.vt.edu/~dlib/
 WWW
pages or large PDF copy files
 Online quizzes based on book by Michael Lesk
(Morgan Kaufmann Publishers)
 Contents based on book, with several other
popular topics added (e.g., agents)
 Separate pages to supplement: Definitions,
Resources (People, Projects), and References
CS -> CSTC -> CRIM
NSF and ACM Education Committee are funding a 2
year project “A Computer Science Teaching Center” CSTC - http://www.cstc.org/
 College of NJ, U. Ill. Springfield, Virginia Tech
 Focus initially on labs, visualization, multimedia
 Multimedia part is also supported by a 2nd grant to
Virginia Tech and The George Washington University:
http://www.cstc.org/~crim/ (with curricular guidelines
also under development)
 ACM will help provide reward for contributors, through
Journal of Computing Education: Resources and
Research (JoCERR?)

Browsing (2)
DLs Shorten the Chain from
Editor
Reviewer
Publisher
A&I
Consolidator
Library
DLs Shorten the Chain to
Author
Teacher
Digital
Reader
Editor
Reviewer
Learner
Librarian
Library
Enhancing Learning with DLs
Enhancing
Learning
Digital
Libraries
Adding to
Digital
Library
(student)
Interactive
Experiences
Using
Digital Library
(direct)
(info literacy)
Other
Interactive
Learning
Activities
Authoring
(text, markup,
hypermedia,
cataloging-DC)
Discovering,
Browsing,
Searching,
Retrieving
Indirectly Using
Digital Library
(embedded,
by agent, ...)
Submitting
Work (ETD)
(Metadata,
PDF, XML)
Annotating,
Downloading,
Installing,
Feedback
Using DL
Contents (tools,
data sets, env's,
courseware, ...)
Preserving
(using stds,
migrating,
versioning)
5S Framework:
Societies,Scenarios,
Streams,Spaces,
Structures
Collaboration
(in/around DL
and its artifacts distance educ.)
OUTLINE
Introduction
Digital
libraries
NDLTD case study
Members, statistics
Relationships, universities
Access, software, hardware
Conclusion
A Digital Library Case Study
Domain:
graduate
education, research
Genre:ETDs=electronic
theses & dissertations
Submission:
http://etd.vt.edu
Collection:
http://www.theses.org
Project:
Networked Digital
Library of Theses &
Dissertations
(NDLTD) http://
www.ndltd.org
ETDs Got Your Interest?
ETD Web Site
http://www.ndltd.org/
Graduate Students
U. Laval
Media
Singapore AM
Chronicle of Higher Ed.
National Public Radio
NY Times ...
Key Ideas:
Scalability
Networked infrastructure
University collaboration
Workflow, automation
Maximal access
Education is the rationale
8th graders vs. grads
Authors must submit
Standards
PDF, SGML, MM
MARC, DC, URNs
Federated search
What are we doing?
 Aiding
universities to enhance grad educ.,
publishing and IPR efforts
 Helping improve the availability and
content of theses and dissertations
 Educating ALL future scholars so they can
publish electronically and effectively use
digital libraries (i.e., are Information
Literate and can be more expressive)
What are the long term goals?
 400K
US students / year getting grad
degrees are exposed / involved
 200K/yr rich hypermedia ETDs that
may turn into electronic portfolios
 Dramatic increase in knowledge
sharing: lit. reviews, bibliographies, …
 Services providing lifelong access for
students: browse, search, prior
searches, citation links
What led to today’s meeting?
1987 mtg in Ann Arbor: UMI, VT, …
 1992 mtg in Washington: CNI, CGS, UMI, VT and 10
universities with 3 reps each
 1993 mtg in Atlanta to start Monticello Electronic
Library (MEL): SURA, SOLINET
 1994 mtg in Blacksburg re ETD project: std of PDF +
SGML + multimedia objects
 1996 funding by SURA, US Dept. of Education
(FIPSE) for regional, national projects
 1997 meetings in UK, Germany, ...
 Sept. 1999 meeting in Paris at UNESCO Headquart.
http://www.unesco.org/webworld/etd/

Status of the Local Project
 Approved
by university governance Spring
1996; required starting 1/1/97
 Submission & access software in place
 Submission workshops for students (and
faculty) occur often: beginner/adv., focused on
Adobe PDF and multimedia formats
 Faculty training as part of Faculty
Development Initiative
 Over 2000 ETDs in collection
Library Costs
$12/vol. for paper thesis processing
– catalog, bind, security strip, label, and shelving
– @950 vols./yr. = $11.4K
 $3.20/vol. ETD processing
– cataloging @950 vols./yr. = $3040
 $.07/vol. Shelving (save 166 ft/yr)
 $.04/vol. Circulation (of 3000 copies/yr)

Student Prepares Thesis or Dissertation
NDLTD
Literature
Computer Resources
Research
Student Defends and Finalizes ETD
My Thesis
ETD
Student Gets Committee Signatures
and Submits ETD
Signed
Grad School
Graduate School Approves ETD
Student is Graduated
Ph.D.
Library Catalogs ETD and New Students
Have Access to the New Research
WWW
NDLTD
OUTLINE
Introduction
Digital
libraries
NDLTD case study
Members, statistics
Relationships, universities
Access, software, hardware
Conclusion
Institutional Members
Coalition for Networked Information (CNI)
 Committee on Institutional Cooperation (CIC)
 Diplomica.com
 Dissertation.com
 Dissertationen Online (Germany)
 Ibero-American Science & Technology Education
Consortium (ISTEC, www.istec.org)
 National Library of Portugal (for all universities)
 Organization of American States (SEDI/OAS)
 UNESCO (www.unesco.org/webworld/etd)

US University Members (35+)
Air
University (Alabama)
Brigham Young University
Cal Tech
Clemson University
College of William & Mary
Concordia University (Illinois)
East Carolina University
East Tenn. State University
Florida Institute of Tech.
Florida International University
George Washington University
Marshall University (W. Va.)
Miami U. of Ohio
MIT (in process)
Michigan Tech
Naval Postgraduate School (CA)
North Carolina State U.
Penn. State University


















Rochester Institute of Tech.
U. of Colorado Health Sci. Cntr.
U. of Florida
U. of Georgia
University of Hawaii, Manoa
U. of Iowa
U. of Maine
U. of Oklahoma
U. of South Florida
U. of Tennessee, Knoxville
U. of Tennessee, Memphis
U. of Texas at Austin
U. of Virginia
U. Wisconsin - Madison
Vanderbilt U.
Virginia Tech - required since 1/97
West Virginia U. - required fall 1998
Worcester Polytechnic Inst.
Australian Project Members
U.
New South Wales (lead institution)
U. of Melbourne
U. of Queensland
U. of Sydney
Australian National University
Curtin U. of Technology
Griffith U.
German Project Members
 Humboldt
University (lead institution)
 3 other universities
 5 learned societies
– Mathematics, Physics, Chemistry, Sociology,
Education
1
computing center
 2 major libraries
CBUC (www.cbuc.es, Spain)
 Consorci
de Biblioteques Universitàries de
Catalunya, as group, with 9 members:
–
–
–
–
–
–
–
–
–
Universitat de Barcelona
Universitat Autonòma de Barcelona
Universitat Politècnica de Catalunya
Universitat Pompeu Fabra
Universitat de Girona
Universitat de Lleida
Universitat Rovira i Virgili
Universitat Oberta de Catalunya
Biblioteca de Catalunya
Other International Members
Chinese
University of Hong Kong
Chungnam National U., Dept of CS (S. Korea)
City University, London (UK)
Darmstadt U. of Tech. (Germany)
Free University of Berlin (Germany - Vet. Med.)
Gyeongsang National U. (Korea)
India Institute of Technology, Bombay (India)
Nanyang Technological U. (Singapore, part)
National U. of Singapore (Singapore, part)
Polytechnic University of Valencia (Spain)
Rhodes U. (South Africa)
St. Petersburg St. Tech.U (Russia)
Univ. de las Américas Puebla (Mexico)
Univ. of Alicante (Spain)
Univ. of Pisa (Italy)
U. Laval; U. of Guelph; U. Waterloo; Wilfrid Laurier U. (Canada)
0
Date Joined
11/11/99
9/11/99
7/11/99
5/11/99
3/11/99
1/11/99
11/11/98
9/11/98
7/11/98
5/11/98
3/11/98
1/11/98
11/11/97
9/11/97
7/11/97
5/11/97
3/11/97
Number of Members
NDLTD Members
80
70
60
50
40
30
20
10
Popular Works 1996
458 Seevers, Gary L. Identification of Criteria for Delivery of Theological Education Through
Distance Education: An International Delphi Study (Ph.D., Educational Research and
Evaluation, April 1993; 1353Kb)
432 Hohauser, Robyn Lisa. The Social Construction of Technology: The Case of LSD (MS in
Science and Technology Studies, Feb. 1995; 244Kb)
390 Childress, Vincent William. The Effects of Technology Education, Science, and
Mathematics Integration Upon Eighth Grader's Technological Problem-Solving Ability (Ph.D.
in Vocational and Technical Education, July 1994; 285Kb)
310 Kuhn, William B. Design of Integrated, Low Power, Radio Receivers in BiCMOS
Technologies (Ph.D. in Electrical Engineering, Dec. 1995; 2Mb)
287 Sprague, Milo D. A High Performance DSP Based System Architecture for Motor Drive
Control ( MS in Electrical Engineering, May 1993; 878Kb)
165 Wallace, Richard A. Regional Differences in the Treatment of Karl Marx by the Founders
of American Academic Sociology (MS in Sociology, Nov. 1993; 479Kb)
150 McKeel, Scott Andrew. Numerical Simulation of the Transition Region in Hypersonic
Flow (Ph.D. in Aerospace Engineering, Feb. 1996; 3Mb)
Popular Works 1997
9920 Liu, Xiangdong. Analysis and Reduction of Moire Patterns in Scanned Halftone Pictures
(Ph.D. in Computer Science, May 1996; 6.6Mb)
7656 Petrus, Paul. Novel Adaptive Array Algorithms and Their Impact on Cellular System
Capacity (Ph.D. in Electrical Engineering, March 1997; 5Mb)
2781 Agnes, Gregory Stephen. Performance of Nonlinear Mechanical, Resonant-Shunted
Piezoelectric, and Electronic Vibration Absorbers for Multi-Degree-of-Freedom Structures
(Ph.D. in Engineering Mechanics, Sept. 1997; ? + 7926Kb)
2492 Gonzalez, Reinaldo J. Raman, Infrared, X-ray, and EELS Studies of Nanophase Titania
(Ph.D. in Physics, July 1996; 4607Kb)
1877 Shih, Po-Jen. On-Line Consolidation of Thermoplastic Composites (Ph.D. in Engineering
Mechanics, Feb. 1997; 3.3Mb)
1791 Saldanha, Kevin J. Performance Evaluation of DECT in Different Radio Environments
(MS in Electrical Engineering, Aug. 1996; 3.2Mb)
1431 DeVaux, David. A Tutorial on Authorware (MS in CS, April 1996; 2.3Mb)
1394 Kuhn, William B. Design of Integrated, Low Power, Radio Receivers in BiCMOS
Technologies (Ph.D. in Electrical Engineering, Dec. 1995; 2518Kb)
Usage of ETDs in VT Collections
Total
requests
Daily
Requests
Abstract
requests
Hosts
served
1996
1997
1998
37,171
247,537
465,974
1999
Jan-Aug
907,104
102
685
1,722
3,121
25,829
112,633
177,647
143,056
9,015
22,725
28,022
52,663
International Use
 1996
 850
 608
 346
 713
 387
 463
 250
 191
 183


22
83
1997
2992
2,501
2378
2367
1264
1161
725
867
1130
967
958
1998
8170 United Kingdom
4223 Australia
7373 Germany
3970 Canada
2201 South Korea
4431 France
2553 Italy
2781 Netherlands
1449 Brazil
1089 Thailand
1414 Greece
OUTLINE
Introduction
Digital
libraries
NDLTD case study
Members, statistics
Relationships, universities
Access, software, hardware
Conclusion
Relationship with publishers
 Concern
of faculty and students that still
wish to publish books or journal articles,
voiced: campus, Chronicle, NPR, Times
 Solution: Approval Form gives students,
faculty choices on access, when to change
access condition; use IPR controls in DL
 Solution: by case, work with publishers and
publisher associations to increase access
–
–
AAP, AAUP
AAAS, ACM, ACS, Elsevier, ...
Some responses from publishers
 ACM:
need to acknowledge copyright
 Elsevier: need to acknowledge copyright
 IEEE-CS: endorse initiative
 ACS: After first publication, can release
 Textbook publishers: different market,
manuscript significantly reworked
 General: restricting access to local campus
will not cause any problems
 Survey by Joan Dalton, Canada
For professional societies
 Like
“writing across the curriculum”
 Besides writing: computing/communications,
information literacy, personal digital library
management, tool use, research methods,
collaboration, archiving/preservation
 Data sets, communities of users of them
 Classification systems / browsing / searching
 National Research Council (NRC) booklet “On
becoming a researcher in the digital age”
Who are sponsors / cooperators?

Funding, Donations of hardware/software
–
–
–
–
–
–

SURA
US Dept. of Education (FIPSE)
Adobe Systems
IBM
Microsoft
OCLC
Others Serving on Steering Committee
– National/Regional Projects: Australia, French
speaking group, Germany, IberoAmerica (ISTEC),
UK (UTOG)
– Council of Graduate Schools, National Lib. Canada,
NSF, OAS, SOLINET, UMI, UNESCO, ...
How does this relate to UMI?
(Bell and Howell)
 1987
UMI workshop to explore ETDs
 Support letter for US Dept. of Ed. proposal
 Steering and technical committee
membership
 ProQuest Direct pilot of scanning works
started 1/1/97, free 2 yr access to front part
 Collaborating on:
–
–
accepting electronic author submissions
standards (e.g., representation), research
ETD Initiative (and UMI)
Students
Learn about
DL, EPub
TDs
become more
expressive
Global TDs
become more
accessible,
archived
Universities
UMI
N. Amer. (T)Ds are
accessible, archived
How can a university get
involved?
 Select
–
–
–
–
planning/implementation team
Graduate School
Library
Computing / Information Technology
Institutional Research / Educ. Tech.
 Send
us letter, give us contact names
 Adapt Virginia Tech solution
–
–
Build interest and consensus
Start trial / allow optional submission
Convene Local Planning Group
ETD
Build Local ETD Site
ETD
Workshop/Training
Digital Library
Policies
Inspection/Approval
Type 1 Members
University Requires ETDs
 Adobe Acrobat
and/or XML/SGML tools
 Automated submission & processing
 Archive/access - UMI, (OCLC, Center for
Research Libraries, Virginia Tech,) ...
 (Local) WWW site, publicity
 (Local) Assistance provided as requested:
email, phone, listserv(s)
Type 2 Members
University Agrees to Require ETDs
 Like
Type 1 but set date not reached
 Usually has an option or pilot
 May: wait for new AY; start with all who
enter after; …
 Build grass roots support
–
–
–
–
Advisory committee: representative? expert?
Champions to spread by word of mouth
Approval: Senates, Commissions, Deans, Students
Publicity to reach community
NDLTD Members, Types 3-7
 3.
Part of university requires ETDs
 4. University allows ETDs
 5. University investigating, has pilot
 6. University consortium joins:
– CBUC (Catalunya, Spain)
 7.
Non-university organization joins
– CNI (Coalition for Networked Info.)
– ISTEC, OAS, UNESCO, …
Everyone Learns
 Students
become “info literate”
 Students learn about discovery, search,
categorization/classification (e.g., CoRC), e-pub
(e.g., XML, multimedia, hypertext), preservation,
helping others find/reuse
 Campus starts to think about IPR
– e.g., Virginia Tech symposium
http://www.rgs.vt.edu/resmag/seminars.html
 Faculty
and students improve quality as reader
base expands
OUTLINE
Introduction
Digital
libraries
NDLTD case study
Members, statistics
Relationships, universities
Access, software, hardware
Conclusion
User Search Support
(multilingual, XML)
NDLTD World Federated
Search
User
Interface
Virginia Tech ...
(univ)
UMI ...
(corporate)
OhioLink ...
(univ group)
Portugese NL ...
(national lib)
Australia
(regional)
Note: All groups shown are connected with NDLTD.
www.theses.org
 James
Powell student project, D-Lib Magazine
description in Sept. 1998
 XML description of each site
– type of search engine / service
– language
– coverage (for resource discovery)
 Adding
Z39.50 gateway capability and
integrating with MARIAN, along with Harvest
and Open Archives protocols (according to Santa
Fe Convention) – see www.openarchives.org
Open Archives Initiative
Santa Fe meeting, Oct. 21-22, 1999
 Workshop early June, San Antonio, DL’00
 LANL, CNI, DLF, Mellon, …
 Convention
 Archives -> Open Archives

–
–
–
–

Support unique archive identifiers
Implement Open Archives Metadata Set (DC-based, using XML)
Implement Dienst harvesting interface
Register the archive
Build tools, layer other services: linking, searching, …
Approaches to Open Archives
Build By institution
Build By
discipline
Author
Category
Interdisciplinary
Year
Language
Query …
Open Archives Members

Original Participants in the Open Archives Initiative
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–

Caroline Arms, Library of Congress
Leslie Carr, University of Southampton
Mark Doyle, American Physical Society
Dale Flecker, Harvard University
Edward A. Fox, Virginia Tech
Michael Friedman, HighWire Press, Stanford University
Paul M. Gherman, Vanderbilt University
Paul Ginsparg, Los Alamos National Laboratory & xxx
Stevan Harnad, University of Southampton
Thomas Krichel, University of Surrey & RePEc
Carl Lagoze, Cornell University
Rick Luce, Los Alamos National Laboratory
Clifford Lynch, Coalition for Networked Information
Kurt Maly, Old Dominion University
Michael L. Nelson, NASA Langley Research Center
John Ober, California Digital Library
Bob Parks, Washington University & EconWPA
Herbert Van de Sompel, University of Ghent
Eric F. Van de Velde, California Institute of Technology
Don Waters, The Andrew W. Mellon Foundation
Ken Weiss, California Digital Library
Others Joining (selected)
–
–
–
–
–
University of Virginia – Jim French, Worthy Martin, Thornton Staples,
NEC Research Institute - C. Lee Giles and Steve Lawrence
Internet Archive - Kurt Bollacker, Marlita Kahn
India - University of Mysore – Shalini Urs
Mexico – University of Monterrey - David Garza Salazar
Access Approaches
 Goal:
Maximize access and services, e.g., by
encouraging:
 UMI centralized services
 Distributed service: Dienst, Z39.50, …
 Regional services (e.g., OhioLink)
 Global service: Open Archives
 Local servers with browse, search
From local catalogs to local archives
 WWW robot indexing and search services
–
Access Possibilities
Web
search
engines
www.
theses.
org
Virginia MIT National
Tech
Library of
Portugal
www.
library
openarchives. catalog
org
clients
CBUC
(Spain)
Ohio
Link
3rd
Party
Services
(e.g.,
Bell &
Howell)
National
Projects:
AU, GE, …
Support Services Developed
 WWW
site with > 300 Mb, CD, videotape
 Automated submission system (MySQL,
UNIX, WWW scripts - grad school/library)
 Student guidelines, style sheets, multimedia
training materials, FAQs, press info
 SGML and XML DTDs for ETDs
 SGML to HTML (web generator)
 LaTeX, Word templates, converters
Support Offered
 Software,
documentation, tech support
 Email, listservs (etd-l@listserv.vt.edu,
eval, -grad, -library, -technical)
 Donations: Adobe, Microsoft
 Evaluation: instruments, analysis
http://scholar.lib.vt.edu - solutions/statistics
 (Temporary storage / archiving; aid - in
setting up an int’l service & archive)
Enhancements
 Dublin
Core spec, MARC crosswalk
 DTDs for SGML, XML(+ <discipline>ML)
 Annotation system (author, friends, notes)
 Routing system (based on Sift)
 Multilingual WWW site, training materials
 Better federated search (w. Z39.50, planned
with Dienst and Harvest - using MARIAN)
 Integrate SFX, CiteSeer (citation database
and linking, plagiarism detection)
Accessibility Activities / Plans
 Interface
design (simple, 3D, VR)
 Usability studies
 Generic multi-lingual support
 Support for those with disabilities
 Hybrid collection (paper, MARC,
abstracts, full-text, multimedia)
 Disciplinary classifications, tools
 Visualization of results, collection
SPIRE Visualization
CAVE Experiments
 Use
a familiar metaphor
– building / floor / room / shelf / book
 Rearrange
orderings / shelving
– use categories, clustering, ranking
– use visualization: colors and gaps
– study space mappings: physical, logical
 Simplify
movement for key tasks
ENVISION
 NSF
“A User-Centered Database from the
Computer Science Literature” (1991-93)
 Collected bib/typesetter data, converted to SGML
 Scanned thousands of page images
 MARIAN search engine - can be made available
(also applied to the Virginia Tech library catalog)
used as part of a prototype object-based DL, with
tailored visualization interface (L. Nowell
dissertation)
Envision Results Window
MARIAN
 Multiple Access
Retrieval of
Information with ANnotations
 (Musical: Marian the Librarian …)
 Evolved from 1980’s CODER system to
a distributed Online Public Access
Catalog (OPAC), then DL backend,
now becoming a full DL system
 From C/C++ to Java by Jianxin Zhao
 Future uses: NDLTD, NUDL, PetaPlex
MARIAN Layers
User
User
User
User Interface Layer
User Information Layer
Search Engine Layer
Database Layer
User
MARIAN Testing Architecture
Load
Generator
Webgate
Java
Server
C/C++
Server
MARIAN Parallelism
response time
(ms)
Java part response time vs. query rate comparation
(type 1 requests)
4000
3000
2000
1000
0
0
100
200
300
queryrate (#/min)
all modules in one machine
one "webgate"
two "webgate"s
four "webgate"s
400
500
MARIAN Response Time
time delay (ms)
Four "webgate"s, decomposed time delay vs. query
rate
4000
3000
2000
1000
0
0
100
200
300
query rate (#/min)
system
after Java server
400
500
France Dissertation
 Key
developer since CODER
 Applying computational linguistics
efforts with machine readable
dictionaries
 Applying opportunistic handling of
term lists for ranking, usable displays
(“to be or not to be, that is the”)
 Developing and evaluating variety of
interfaces
PetaPlex
 Digital
Library Machine (“super” object store)
 Parallel
computer / storage utility for scale of 1000
to 100,000,000 gigabytes (1 Tbyte - 100 Pbyte)
 Knowledge
Systems Incorporated supplied VTPetaPlex-1 for $250,000 with
– high speed backbone connections (eventually 1 Gbps)
– 2.5 terabytes through 100 “Nanoservers”:
– Each = Network connection + IBM 25GB disk +
233 MHz Pentium II + Linux
Service
Machine 1
Service
PetaPlex Complex
Service
Machine 2
Nanoserver
FRONT END MACHINE
RS/6000, 1G RAM, 4 Proc.
Machine 3
Service
Machine 4
Nanoserver
Nanoserver Nanoserver Nanoserver Nanoserver
Nanoserver Nanoserver Nanoserver Nanoserver
Nanoserver Nanoserver Nanoserver Nanoserver
PetaPlex Service Machines
 Front-end
provides handle/repository
abstraction through hashing
 Small object server
 Large object server
– video on demand
– streaming audio
 Information
retrieval server
 Proxy / cache server (e.g., 1 terabyte server
of 1000 worldwide for Comsat/Intelsat)
PetaPlex Top View
4 ft.
side
PetaPlex Side View
15
Roles:
* Support
* Cooling
* Power
shelves
8 ft.
high
4 ft. wide
PetaPlex Cost Goals, Approach
 Maximize
number of seeks achievable
 Maximize % of cost invested in disks
 Maximize flexibility and reliability
 Minimize cost per unit of storage
 Approach
“information utility”
 Increase throughput and reliability by
replicating on other PetaPlex systems
 Use robotics, wireless, and commodity
production of nanoservers
Sornil & Mather Dissertations
 Proposing
100 Tbyte wireless Petaplex for $2M
 Mather: efficiently handling very large numbers
of objects of varying sizes
 Sornil: efficiently handling IR for very large
dynamic collections, large numbers of users,
high transaction rates, large inverted files
– modeling and simulation
– data organization
– parallelization of algorithms, alone and in
combination for retrieval (related) tasks
Problems Addressed
THE PETAPLEX-TYPE ARCHITECTURE
Early Results
PARTITIONING SCHEMES: A PRELIMINARY ANALYSIS
 To preliminarily study effects of three important parameters
on the performance of the partitioning schemes
 Term selection characteristic
 Number of queries in the system
 Number of disk nodes
 To have a preliminary performance trend of the Hybrid
partitioning scheme
 Inverted File Partitioning Schemes:
 Term Partitioning
 Document Partitioning
 Hybrid Partitioning
Early Results
Effects of Number of Disk Nodes
Skewed Term Selection
(with 1024 queries in the system)
Uniform Term Selection
(with 1024 queries in the system)
OUTLINE
Introduction
Digital
libraries
NDLTD case study
Members, statistics
Relationships, universities
Access, software, hardware
Conclusion
DL Challenges
 Preservation
- so people with trust DLs
infrastructure – affordable storage in
large capacity, very fast networks, ...
 Supporting
 Scalability,
sustainability, interoperability
 DL industry
- critical mass by covering libraries,
archives, museums, corporate info, govt info,
personal info - “quality WWW” integrating IR,
HT, MM, ...
– Need tools & methods to make them easier to build
Download