View/Open

advertisement
Digital Libraries: From
Theory to Applications in
Education and Business
ICADL 2000 – Seoul, Korea
December 7, 2000
Edward A. Fox
fox@vt.edu http://fox.cs.vt.edu
CS
DLRL
Internet TIC
Virginia Tech, Blacksburg, VA, USA
Outline
Introduction
Education
(5S)
(CSTC, NDLTD)
OAI
MARIAN
Conclusions
Acknowledgements (Selected)
Conference Organizers and Sponsors
 Mentors: JCR Licklider, Michael Kessler, Gerard Salton
 Sponsors: Advance Auto Parts, CNI, DLF, IBM, NLM,
NSF, OCLC, UNESCO, US Dept. of Ed. (FIPSE), …
 VT Faculty/Staff: Tony Atkins, Debra Dudley, John Eaton,
Jim Hicks, Lance Matheson, Gail McMillan, James Powell,
…
 VT Students: Fernando Das Neves, Robert France, Marcos
Goncalves, Neill Kipp, Paul Mather, Ryan Richardson, Ohm
Sornil, Hussein Suleman, Omar Vasnaik, Marc Vass, …
 Visitors: Mann-Ho Lee (Korea), Byongsun Kim (Korea),
Shalini Urs (India), Akira Maeda (Japan)

Internet Technology
Innovation Center
Supported by Virginia’s Center for Innovative Technology
Statewide University Partners - Governing Board:

Christopher Newport University
– William Winter, William Muir, Virginia Electronic Commerce Technology
Center / Southeastern Virginia Network (VECTEC/SEVAnet)

George Mason University
– Scott Martin, Internet Multimedia Center (ICM)
– Steven Ruth, International Center for Applied Studies in IT (ICASIT)

University of Virginia
– Alf Weaver, Internet Commerce Group (InterCom)
– Jim French, Internet Digital Library

Virginia Tech
– Edward Fox, Digital Library Research Laboratory (DLRL), CC, CS
– Scott Midkiff, Center for Wireless Telecomm. (CWT), VTISC, ECpE
JCDL 2001
 First
Joint ACM/IEEE Conference on
Digital Libraries (+ NSF DLI-2 PI mtg)
http://www.jcdl.org
 June 24-28, 2001 in Roanoke, VA
 Conference Committee:
General Chair: Edward A. Fox, Virginia Tech
 Program Chair: Christine Borgman, UCLA
 Treasurer: Neil Rowe, Naval Postgraduate School
 Posters Chair: Craig Nevill-Manning, Rutgers U.

URLs
 http://fox.cs.vt.edu
 http://www.dlib.vt.edu
(DLRL)
 http://ei.cs.vt.edu/~dlib (Courseware)
 www.ndltd.org & www.theses.org
 www.cstc.org (CSTC and JERIC)
 www.openarchives.org (OAI)
 www.jcdl.org (JCDL’2001 – June 24-28)
Collaboration!
U.S. – Korea Joint Workshop on
Digital Libraries
San Diego Supercomputer Center
August 10 & 11, 2000
Sponsored by
National Science Foundation, USA
Ministry of Information & Communication, Korea
Institute of Information Tech. Assessment, Korea
San Diego Supercomputer Center
University of Maryland
Virginia Tech
Workshop Participants (1 of 3)
Robert Allen
University of Maryland
rba@GLUE.UMD.EDU
Dookwon Baik
Korea University
baik@SWSYS2.KOREA.AC.KR
Ching-Chih
Chen
Simmons College, Boston
chen@SIMMONS.EDU
Su-Shing Chen
University of Missouri - Columbia
schen@ECN.MISSOURI.EDU
Jonghoon Chun
Myongji University
jchun@WH.MYONGJI.AC.KR
Gregory Crane
Tufts University
gcrane@PERSEUS.TUFTS.EDU
Lois Delcambre
Oregon Graduate Institute
lmd@CSE.OGI.EDU
Edward Fox
Virginia Tech
fox@VT.EDU
Michael Gertz
University of California, Davis
gertz@CS.UCDAVIS.EDU
Stephen
Helmreich
New Mexico State University
shelmrei@CRL.NMSU.EDU
Workshop Participants (2 of 3)
Ulf Hermjakob
USC Information Sciences Institute
ulf@ISI.EDU
Soon Joo Hyun
Information & Communications
University (ICU)
shyun@ICU.AC.KR
Hyeon Kim
Korea Research & Development
Information Center
hyeon@KORDIC.RE.KR
Sung-Hyuk Kim
Sookmyung Women’s University
ksh@SOOKMYUNG.AC.KR
Yongchae Kim
Ministry of Information &
Communication
yongari@MIC.GO.KR
Ron Larsen
University of Maryland
rlarsen@DEANS.UMD.EDU
Sang-goo Lee
Seoul National University
sglee@MARS.SNU.AC.KR
Sang Ho Lee
Soongsil University
shlee@COMPUTING.SOONGSIL
.AC.KR
Young-Suk Lee
MIT, Lincoln Laboratory
ysl@SST.LL.MIT.EDU
Karl Lo
University of California, San Diego
klo@UCSD.EDU
Workshop Participants (3 of 3)
Bruce Miller
University of California, San Diego
Rbmiller@UCSD.EDU
Sung Been
Moon
Yonsei University
sbmoon@YONSEI.AC.KR
Reagan Moore
San Diego Supercomputer Center
moore@SDSC.EDU
Sung Hyon
Myaeng
Chungnam National University
shmyaeng@CS.CHUNGNAM.AC.
KR
Gang-Tak Oh
National Computerization Agency, Seoul
okt@NCA.OR.KR
Sam-Gyun Oh
SungKyunKwan University
samgyun@YAHOO.COM
samoh@YURIM.SKKU.AC.KR
Hae-Chang Rim
Korea University
rim@NLP.KOREA.AC.KR
Shalini Urs
University of Mysore
shaliniurs@HOTMAIL.COM
Lee Zia
National Science Foundation
lzia@NSF.GOV
Some Observations
So many conferences! Lots of R&D!
 Exhibits: a DL industry is emerging.
 But: we don’t cite each other’s works;
 nobody is asking “Why”;
 we are not connecting theory + projects;
 nobody is talking about OAI.


So, I’ve redone my talk, since you can see:
– paper in proceedings
– demo tomorrow (p. 327) and online
– see tutorial notes (in book) and online
DL = Users Direct
(Organized Artifact Mediated Communication)
Author
Teacher
Digital
Reader
Learner
Sponsor
Library
Reviewer Editor Publisher Librarian
DL = Users Direct
(Organized Artifact Mediated Communication)
Parts Supplier
Inventory
Sales
Agent
Training
Shopper
Repair
Garages
Store
Manuals
B2C Home
Staff
Digital
Library
Sales Partners
B2B
CS 6604: Digital Libraries (Fall 2000)
http://scholar.lib.vt.edu/imagebase/
DL of Images of Birds for Virginia Tech Museum
of Natural History
Student Team
Ameya Datey
Aniket Sule
Supriya Angle
Balaprasuna Chennupati
and the Eagle Scouts
Under the guidance of
Dr. Edward Fox
Ms. Llyn Sharp (VT Museum of Natural History)
Mr. Anthony Atkins (Digital Library and Archives)
Plus, 3-D
VTMNH
minerals
in
UH3004
Libraries of the Future
JCR Licklider, 1965, MIT Press:
Unified Theory?
 Not
ready in 1960s
 Analog – unified field theory in physics
 “Mess” today – segmented field, specialities
– Database <-> Knowledge <-> Content Mgmnt
– Multimedia, Hypermedia, Hypertext
– Logic, Algebra, Artificial Intelligence, …
 Expensive,
annoying for users
– Don’t know where to look
– Don’t know how to use services
5S Layers
Societies
Scenarios
Spaces
Structures
Streams
Definition: Digital Libraries
are complex systems that
 help
satisfy info needs of users (societies)
 provide info services (scenarios)
 organize info in usable ways (structures)
 present info in usable ways (spaces)
 communicate info with users (streams)
Definition: 5S Framework
 Societies:
interacting people (, computers)
 Scenarios: services, functions, operations,
methods
 Spaces: domains + constraints (e.g., distance,
adjacency): 2D, vector, probability
 Structures: relations, trees, nodes and arcs
 Streams: sequences of items (text, audio,
video, network traffic)

(5 Element System: Fire, Wood, Earth, Metal, Water)
5S: Combinations
 Societies
+ Scenarios = user model
 Societies + Scenarios + Spaces = user interface
 Streams + Structures = markup
 Streams + Structures + Scenarios = object
 Structures + Scenarios = DBMS
Outline
Introduction
(5S)
Education
(CSTC, NDLTD)
OAI
MARIAN
Conclusions
NSDL Spine
Portals
&
Portals
Portals
& &
Clients
Clients
Clients
NSDL
NSDL
Services
Other
NSDL
Services
Services
full-service
full-service
collections
NSDL
collections
Collections
referenced
referenced
Referenced
items&&
items
Items
&
collections
collections
Collections
Core CollectionCore
Building
CollectionServices
harvesting
Core
Building
CollectionServices
persistence
Building
Services
protocol mediation
Core CollectionUsage
CIServices
Services
annotation
CI Services
query transform
CI Services
topic-map
CIregistry
Services
personalization
discussion
(Slide from Dave Fulker, Bill Arms – 11/2/2000)
ARIADNE Screens (E. Duval)
CS Teaching Center (CSTC)

Instead of building large, expensive multimedia packages,
that become obsolete and are difficult to re-use, concentrate
on small knowledge units.

Learners benefit from having well-crafted modules that
have been reviewed and tested.

Use digital libraries to build a powerful base of support for
learners, upon which a variety of courses, self-study
tutorials & reference resources can be built.

ACM Education Board and SIG support, new NSF grant
with UNCW, Eduprise, TCNJ, … - iLumina Project

ACM J. of Educational Resources in Computing (JERIC)
Browsing (1)
Browsing (2)
A Digital Library Case Study
Domain:
graduate
education, research
Genre: ETDs =
electronic theses &
dissertations
Submission:
http://etd.vt.edu
Collection:
http://www.theses.org
Project:
Networked Digital
Library of Theses &
Dissertations
http://www.ndltd.org
(NDLTD – remember:
ND LTD / NDL TD)
(also, newer NUDL:
Networked University
Digital Library, with
e-courseware, etc.)
ETD Initiative (and UMI)
Students
Learn about
DL, EPub
TDs
become more
expressive
Global TDs
become more
accessible,
archived
Universities
UMI
N. Amer. (T)Ds are
accessible, archived
Library Catalogs ETD, Access is
Opened to the New Research
WWW
NDLTD
What are the long term goals?
 Attract
all TDs/yr: 50K D-US, 25K D-Germany,
10K TD-Canada, …
 >200K/yr rich hypermedia ETDs that may turn into
electronic portfolios (images, video, audio, …)
 Dramatic increase in knowledge sharing: literature
reviews, bibliographies, …
 Services providing lifelong access for students:
browse, search, prior searches, citation links
 Hundreds/thousands of downloads / year / work
The Networked Digital Library of Theses and Dissertations
www.NDLTD.org
Training Authors
Expanding Access
Preserving Knowledge
Improving Graduate Education
Enhancing Scholarly Communication
Empowering Students & Universities
Leader of the Worldwide ETD
(Electronic Thesis and Dissertation) Initiative
Outline
Introduction
(5S)
Education (CSTC, NDLTD)
OAI
MARIAN
Conclusions
Why do we need the Open
Archives Initiative ?
Current standards are too complicated
 Information wants to be free !


We can decouple
– Running an archive (DL content collection)
– Running a service (DL system / operation)
So we can have more and better archives, that
build on each other
 So we can have better services, that work on
multiple collections

OAI: Archives of Digital Objects
Archive
Access
Protocol
Handle
(ID)
terms and conditions
Digital object
The Open Archives Initiative
www.openarchives.org
a technical introduction
Hussein Suleman (hussein@vt.edu)
Virginia Tech DLRL
December 2000
History
 Santa
Fe Convention (October 1999)
– Electronic pre-print community
 San Antonio
(July 2000), Lisbon (Sept. 2000)
– Broader interest from other parties
 Ithaca
Meeting (September 2000)
– Formulation of general-purpose protocol
 OAI
Open Meetings (January –Feb. 2001)
– Public release of specifications
Federation vs. OAI Harvesting
 Federation
– Sending out queries to remote sites and
combining results
 Harvesting
– Gathering all metadata from remote sites
into a central search system
– Lightweight protocol
– Robust
– Less network traffic
– Redundant servers
Black Box OAI-ETD Perspective
…
www.theses.org
BN.PT
(Portugal)
SEALS
(S.Africa)
OhioLINK
Dissert.Online
(Germany)
CBUC
(Catalunya)
CIC
…
VT
CyberTheses
(Francophone)
NDC
(Greece)
MIT
ISTEC
(Ibero
America)
U. Bergen
(Norway)
ADT
(Australia)
PhysDis
NSYSU
(Taiwan)
Splitting Data & Services
 Data
Provider
– Implements the OAI protocol on archive to
allow external access to data
 Service
Provider
– Uses the OAI protocol to access external
archives and provide services (such as
searching or linking) on their metadata
The Big Picture
DL
Repository 1 Repository 2 Repository 3 Repository 4
Requirements for OAI Protocol
 Unique
identifiers (URNs) for each record
 Date-stamp
for each record when last
modified/created/deleted
 HTTP server
with scripting ability
OAI Harvesting Protocol v1
 Operates
over HTTP
 HTTP Requests and XML Responses
 HTTP Error codes
6
Service requests (verbs):
– Identify, ListMetadataFormats, ListSets
– ListIdentifiers, GetRecord, ListRecords
Identify - Response
ListMetadataFormats - Response
GetRecord - Response
Verb: ListRecords
 Retrieves
metadata for multiple records
 Parameters
–
–
–
–
–
from – start date (O)
until – end date (O)
set – set to harvest from (O)
resumptionToken – flow control mechanism (X)
metadataPrefix – metadata format (R)
ListRecords - Response
Feature: Different Metadata
Feature: Date Ranges
Feature: Resumption Token
Repository Explorer
ODU Search Service
What Next ?
 In
General
– Cross-archive searching
– Cross-archive linking, de-duping,
threading
– Selective Filtering
– Open-DL in a Box ?
 VT
– The VT Digital Library
– NDLTD Union Catalog
the Open Archives Initiative
Herbert Van de Sompel
Cornell University -- Computer Science
[acknowledgements]
Carl Lagoze
DLF FALL FORUM 2000 – Chicago – November 18th 2000
Actions
• establish organizational stability for the OAI:
• institutional backing from CNI & DLF
• steering committee: policy guidance
• technical committee: technical specifications
• executive group: day to day coordination
• workshops: public dissemination, feedback
• revise specifications to allow adoption beyond
preprints
herbert van de sompel
low-barrier interop umbrella
metadata
e-print
FTXT
OPAC
A&I
image
herbert van de sompel
low-barrier interop umbrella
e-print
metadata
FTXT
Author
Title
Abstract
Identifer
OPAC
A&I
image
herbert van de sompel
OAI harvesting tools
service provider
harvester
data provider
repository
Datestamp
Identifier
Set
Records
herbert van de sompel
r
e
p
o
s
i
t
o
r
y
revision of specifications
• publication of specifications:
• January 2001
• US Open Day, January 23rd Washington DC
• EC Open Day, February 2001, Berlin
• freeze specifications for 1 year:
• stable for experimentation; not definitive
• minimize risk for early adopters
• maximize chances for future interoperability
across communities
herbert van de sompel
alpha test of specs (11/2000-01/2001)
• data providers:
• arXiv -- Los Alamos
• NACA -- NASA
• CogPrints -- U Southampton
• ETD -- Virginia Tech
• Thesis & Dissertations from WorldCat -- OCLC
herbert van de sompel
alpha test of specs (11/2000-01/2001)
• data providers:
• HeinOnline law journals -- Cornell U
• TEI-lite collection -- U Tennessee
• STM publisher metadata -- U Illinois
• Resource Disovery Network -- UKOLN
• Open Language Archives -- U Pennsylvania
• Open Video Project -- U North Carolina
• Museum info. -- CIMI
herbert van de sompel
alpha test of specs (11/2000-01/2001)
• software:
• OAI harvesting interface to Ex Libris Aleph 500
Integrated Library System -- Ex Libris
• OAI harverster – Cornell U
•OAI harverster – Virginia Tech
• Open-source software capable of creating a
merged catalog of metadata harvested from OAIservers -- OCLC
herbert van de sompel
alpha test of specs (11/2000-01/2001)
• service providers:
• Repository explorer -- Virginia Tech
• MARIAN DL -- Virginia Tech
• ARC service -- Old Dominion U
herbert van de sompel
New OAI mission statement
The Open Archives Initiative develops and promotes
interoperability standards that aim to facilitate the
efficient dissemination of content.
The Open Archives Initiative has its roots in an effort
to enhance access to e-print archives as a means of
increasing the availability of scholarly
communication. Continued support of this work
remains a cornerstone of the Open Archives program.
herbert van de sompel
New OAI mission statement
The fundamental technological framework and
standards that are developing to support this work
are, however, independent of the both the type of
content offered and the economic mechanisms
surrounding that content, and promise to have
much broader relevance in opening up access to a
range of digital materials.
[...]
herbert van de sompel
Harvesting Document Metadata
for Federated Search
CS6604 Fall 2000 Project
Presented By
Avnish Kumar Chhabra
Benefits of Harvesting
 Limited
storage requirement
 Fast search
 Consistently ranked results
 Improved reliability
 Distributed collections are transparent to
user.
 Efficient use of network resources.
Design of the Solution
OAI
wrapper
Digital
Library
collection
Parser/Updater
Update Scheduling
Query Generation
Z39.50
Wrapper
Queries
Replies
MARIAN
Metadata
Database
New
Metadata
Boundary of
System
developed
Implementation
Server, Protocol, Update Frequency
Main scheduler thread:
OAI harvester
class:
OAIInterface
Schedule
File
SiteInfo
HarvestorMonitor:
Monitor for arbitrating
access to network
resources
OAIHandler
XML Document
Event Handler
class
Instantiated
with URL of
OAI site
And scheduling
frequency
Abs
Sub
DL Collection
Auth
Features of the system developed
 Per-collection
execution thread
 Schedules updates
 Encapsulation of protocol specific details
 Extensibility
 Control over active execution threads
 Fault tolerance
– Server unreachable
– Failure / timeout of individual connections
 Time
zones and date ambiguity considered
Outline
Introduction
(5S)
Education (CSTC, NDLTD)
OAI
MARIAN
Conclusions
MARIAN Layers
User
User
User
User Interface Layer
User Information Layer
Search Engine Layer
Database Layer
User
Search Services
Recommendation Services, etc
Analysis
Indexing
Linking
5SL
Source
Description
NDLTD/NUDL/Digital
Library User
MARIAN Mediation Middleware
Local Data Store
Wrapper
Generator
Queries + Results
wrapper
wrapper
Dublin
Core
SOIF
Harvest
protocol
German
PhysDis
Collection
...
Collection
wrapper
MARC
Open Archives
protocol
VT OAI
wrapper
Z39.50
protocol
...
RFC1807
Dienst
protocol
Greek
Hellenic Dissertations
Collection
MIT ETD
Collection
Part of Hierarchy of
MARIAN Classes
Dig ital Information
Object
Structured
Document
English Text
Controlled
String
Text
Non-English
European
Language Text
Korean Text
Person’s
Name
Relevant Document Structure
MARIAN-Phronesis
Interoperability
CS6604 Fall 2000 Project
Tracy Lewis
Ryan Richardson
Kim Woods
MARIAN-Phronesis V1
Architectural Diagram
MARIAN
Search
Page
PHRONESIS
Marian
Query
CGI Script
Phron
Query
Display
to user
Create
object
instance
CGI Script
Phron
Results
MARIAN-Phronesis Login Page
Query in Español
Outline
Introduction
(5S)
Education (CSTC, NDLTD)
OAI
MARIAN
Conclusions
Conclusions
 Education
is an important application of DLs
 Having a framework and theory may lead to
better (more effective) systems and broader
applicability
– 5S
– MARIAN
 Interoperability is part of the DL grand
challenge
– OAI
Download