2002ICADLprojects - Edward A. Fox

advertisement
University Electronic Publishing
through
Digital Libraries:
Courseware, Theses and Dissertations
Singapore - Dec. 2002
Edward A. Fox
fox@vt.edu http://fox.cs.vt.edu
CS
DLRL
Internet TIC
NDLTD
CITIDEL
NSDL …
Virginia Tech, Blacksburg, VA, USA
Acknowledgements (Selected)
• Sponsors: ACM, Adobe, IBM, Microsoft, NSF (Grants
CDA-9312611; DUE-0121741, 0136690, 0121679; IIS0080748, 0086227, 0002935, and 9986089), OCLC,
SOLINET, UNESCO, US Dept. Ed. (FIPSE), VTLS, …
• Faculty/Staff (now): Boots Cassel, Debra Dudley, Lee
Giles, Rex Hartson, John Impagliazzo, Deborah Knox,
JAN Lee, Kurt Maly, Gail McMillan, Manuel Perez,
Muhammad Zubair, …
• Students: Fernando Das Neves, Marcos Goncalves, Paul
Mather, Ryan Richardson, Priya Shivakumar, Hussein
Suleman, Wensi Xi, …
• UNESCO Analytical Survey: Leonid Kalinichenko
Outline
• Case Study: NDLTD
•
•
•
•
Case Study: CSTC
Case Study: CITIDEL
Interoperability: OAI, ODL
Conclusions
A Digital Library Case Study
• Domain: graduate
Project:
education, research
Networked Digital
• Genre:ETDs=electronic Library of Theses &
theses & dissertations
Dissertations
• Submission:
(NDLTD)
http://etd.vt.edu
http://www.ndltd.org
• Collection:
http://www.theses.org
The Networked Digital Library of Theses and Dissertations
www.NDLTD.org
Training Authors
Expanding Access
Preserving Knowledge
Improving Graduate Education
Enhancing Scholarly Communication
Empowering Students & Universities
Leader of the Worldwide ETD
(Electronic Thesis and Dissertation) Initiative
NDLTD
Grad
Program
IT
Library
Ed.
(Tech)
Key Ideas:
Scalability
Networked infrastructure
University collaboration
Workflow, automation
Education is the rationale
Maximal
Access
8th graders vs. grads
Authors must submit
Standards
PDF, SGML, MM,
MARC, DC, URNs,
Federated search
What led to today’s meeting?
• 1987 mtg in Ann Arbor: UMI, VT, …
• 1992 mtg in Washington: CNI, CGS, UMI, VT and 10
universities with 3 reps each
• 1993 mtg in Atlanta to start Monticello Electronic Library
(regional, US Southeast): SURA, SOLINET
• 1994 mtg at VT: std: PDF + SGML + multimedia objects
• 1996 funding by SURA, US Dept. of Education (FIPSE)
• 1997 meetings in UK, Germany, ...
• 1998 – 1st symposium – Memphis (20)
• 1999 – 2nd symposium – Blacksburg (70)
• 2000 – 3rd symposium – St. Petersburg (225)
• 2001 – 4th symposium – Caltech (200)
• 2002 – 5th symposium–BYU; 2003–Berlin; 2004–Kentucky
What are the long term goals?
• 400K US students / year getting grad degrees are
exposed / involved
• 200K/yr rich hypermedia ETDs that may turn into
electronic portfolios (images, video, audio, …)
• Dramatic increase in knowledge sharing: literature
reviews, bibliographies, …
• Services providing lifelong access for students:
browse, search, prior searches, citation links
• Hundreds/thousands of downloads / year / work
Convene Local Planning Group
ETD
Build Local ETD Site
ETD
Workshop/Training
Digital Library
Policies
Inspection/Approval
Student Prepares Thesis/Dissertation
NDLTD
Literature
Computer Resources
Research
Student Defends & Finalizes ETD
My Thesis
ETD
Student Gets Committee
Signatures and Submits ETD
Signed
Grad School
Graduate School Approves ETD,
Student is Graduated
Ph.D.
Library Catalogs ETD, Access is
Opened to the New Research
WWW
NDLTD
National / Regional Projects
• Australia
•
•
•
•
•
•
•
U. New South Wales (lead)
U. of Melbourne
U. of Queensland
U. of Sydney
Australian National U.
Curtin U. of Technology
Griffith U.
• Germany
• Humboldt University (lead)
• 3 other universities
• 5 learned societies: Math,
Physics, Chemistry,
Sociology, Education
• 1 computing center
• 2 major libraries
• OhioLINK: 79 colleges/univs
• Consorci de Biblioteques
Universitàries de Catalunya,
as group, www.cbuc.es: 9
sites
• India
• Korea
• Brazil
• UK (British Library, JISC,
Edinburgh)
• UNESCO (especially Latin
America, Eastern Europe,
Africa)
Some Countries
•
•
•
•
•
•
•
•
•
•
•
•
•
Australia
Belgium
Brazil
Canada
China, Hong Kong
Columbia
Finland
France
Germany
India (Hyderabad)
Italy
Korea
Mexico
•
•
•
•
•
•
•
•
•
•
•
•
Netherland
Norway
Russia
Singapore
S. Africa (Rhodes U.)
S. Korea
Spain
Sudan
Sweden
Taiwan
UK
USA
Institutional Members
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
British Library
Cinemedia
Coalition for Networked Information (CNI)
Committee on Institutional Cooperation (CIC)
Consorci de Biblioteques Universitàries de Catalunya
Diplomica.com
Dissertation.com
Dissertationen Online (Germany)
ETDweb, a Division of Answer4.com
Ibero-American Science & Technology Education Consortium
(ISTEC)
National Documentation Centre (NDC), Greece
National Library of Portugal (for all universities)
OCLC Online Computer Library Center
OhioLINK
Organization of American States (SEDI/OAS)
Southeastern Library Network (SOLINET)
UNESCO (www.unesco.org/webworld/etd)
Access Possibilities
Web
search
engines
www.
theses.
org
Virginia MIT National
Tech
Library of
Portugal
www.
library
openarchives. catalog
org
clients
CBUC
(Spain)
Ohio
Link
3rd
Party
Services
(e.g.,
UMI)
National
Projects:
AU, GE, …
ETD-MS
• ETD Metadata Standard
• XML-encoded metadata standard
(content and encoding) for Electronic
Theses and Dissertations (ETDs)
• in part conforming to Dublin Core (DC)
• using UNICODE
• (optionally / later using RDF)
• Well specified relationship with MARC
NDLTD Members and ETD-MS
• NDLTD members will
• Share metadata for their ETDs
• Providing that in either ETD-MS
• Or if they use a version of MARC locally,
work to have that eventually shared in
either MARC21 or UNIMARC
• Run OAI, either locally or in consortia, so
their metadata can be harvested, according
to necessary terms and conditions
Some recent additions
• ETD individuals support
• http://etdindividuals.dlib.vt.edu:9090
• ETD discussion (e-prints)
• http://ndltdpapers.dlib.vt.edu:9090
• Conference papers and presentations
• http://www.ndltd.org/WVUproc.htm
• Marcel Dekker book in publication
What are plans at VT?
• LOCKSS welcomed us
• Lots of Copies Keeps Stuff Safe
•
•
•
•
MARIAN: harvest, crawl/scrape, fed search
Metadata crosswalks and format converters
XML schema for ETDs
Open Digital Libraries: easy to add
services!
• http://oai.dlib.vt.edu/odl
Union catalog (OCLC)
• OCLC will expand the OAI data provider
on TDs
• Will get data from WorldCat
• Will harvest from all who contact them
• Need DC and either ETD-MS or MARC
• Will have a set for ETDs
Union catalog (VTLS, VT)
• VTLS will enhance search/browse service
for ETDs
• Will harvest from OCLC’s set of ETD records
• Will receive through other mechanisms, too
• Will work with MARC-21 and ETD-MS
• VT will continue to offer experimental
services
NUDL (www.nudl.org)
Int’l Research Support
• Networked University Digital Library
• Partners: Germany, Mexico (Puebla and
Monterrey), Brazil
• Problems: Multilingual search, high
performance DLs, requirements/usability, …
• Start with ETDs, then expand to other
student works, portfolios, data sets, (CS)
courseware, ...
Outline
• Case Study: NDLTD
•
•
•
•
Case Study: CSTC
Case Study: CITIDEL
Interoperability: OAI, ODL
Conclusions
CS Teaching Center (CSTC)
• Instead of building large, expensive multimedia packages,
that become obsolete and are difficult to re-use, concentrate
on small knowledge units.
• Learners benefit from having well-crafted modules that
have been reviewed and tested.
• Use digital libraries to build a powerful base of support for
learners, upon which a variety of courses, self-study
tutorials & reference resources can be built.
Browsing (2)
JERIC
• Journal of Educational Resources
in Computing
• Accessible from www.cstc.org and www.acm.org
• ACM and SIGCSE support
• Refereed and interactive
• Part of ACM Digital Library
Outline
• Case Study: NDLTD
•
•
•
•
Case Study: CSTC
Case Study: CITIDEL
Interoperability: OAI, ODL
Conclusions
www.CITIDEL.org
• Computing and Information Technology
Interactive Digital Education Library, an NSDL
Collection Track project
• Led by Virginia Tech, with co-PIs:
• Fox (director, DL systems)
• Lee (history)
• Perez (user interface, Spanish support)
• Partners
• College of New Jersey (Knox)
• Hofstra (Impagliazzo)
• Villanova (Cassel)
• Penn State (Giles)
Summary of Spring 2001 Survey of
CITIDEL-related Collections
and their Sizes
Size of
Collection
1-5
items
6-100
items
101-999
items
+1000
items
Number of
Collections
Identified
100-300
50
20-35
10-25
Multi-dimensional Categorization
Quality
Peer reviewed
Editor reviewed
Nominated
Identified by crawl
Algorithms
Java
English
Multimedia
Spanish
Language
Topic
CITIDEL Collection Sources
include
ACM
include
CSTC
Research
Index
IEEE-CS
…
NCSTRL
include
metadata
include
ACM
DL
fulltext
include
SIGCSE
proceedings
NEC’s
data
JERIC
Experts’
finding
aids
include
data
processed
w. R.I.
Borner’s
info viz
software
repository
CITIDEL Collection Building
thru
Nominating
Submitting
include after
Creating
include after
Composing
using
VIADUCT
after
Searching,
Browsing
thru
GetSmart
or thru
Crawling
aided by
Classifying
using
Crawlifier
Overview of CITIDEL architecture
USER PORTALS
DIGITAL LIBRARY SERVICES
REPOSITORIES
Distributed repository structure
Digital Library Services
OAI
Data
Provider
Applets
Repository
OAI
Data
Harvester
Union Metadata
Repository
Laboratories
Repository
Syllabi
Repository
Papers
Repository
...
Digital library architecture for local
and interoperable CITIDEL services
EDUCATORS
Multilingual
Searching
LEARNERS
Browsing
Union Metadata
Filtering
Filtering Profiles
OAI
Data
Provider
Annotating
ADMINISTRATORS
Revising
Administering
User Profiles
Annotations
OAI
Data
Harvester
Remote and Peer Digital Libraries (eg. NSDL -CIS)
PORTALS
SERVICES
REPOSITORIES
Outline
• Case Study: NDLTD
•
•
•
•
Case Study: CSTC
Case Study: CITIDEL
Interoperability: OAI, ODL
Conclusions
Open Archives Initiative
OAI
www.openarchives.org
openarchives@openarchives.org
The World According to OAI
Service Providers
Discovery
Current
Awareness
Data Providers
Preservation
Technical Umbrella for Practical
Interoperability…
Reference
Libraries
Museums
Publishers
E-Print
Archives
…that can be exploited by different communities
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
OAI – Black Box Perspective
Services:
Search
Browse
Metadata:
Summarize
Visualize
OA 7
OA 4
OA 2
OA 3
OA 1
OA 6
OA 5
Docs:
DO
DO
DO
DO
DO
DO
DO
Aggregation through
OAI Harvesting
CITIDEL
NCSTRL
Lite Sites
Archive
Eprints
Active
Own: History,
ResearchIndex,
CSTC, …
IEEE-CS,
ACM, …
Approaches to Open Archives
Build By Institution
Build By
Discipline
Author
Category
Interdisciplinary
Year
Language
Query …
OAI Perspective
• Rethink your efforts in terms of providers of
• Data, Services
• Reduced work for data providers
• Tools available
• Don’t need to offer services
• Reduced work for service providers
• Others provide the data
• Can use tools and systems for OAI, XOAI
• Results
• More data becoming available
• To more people
• Supported by improved services
repository
support
data
harvesting
data
h
a
r
v
e
s
t
e
r
OAI protocol
r
e
p
o
s
i
t
o
r
y
items
selective harvesting - datestamps
harvest within
date range
record
record
r
e
p
o
s
i
t
o
r
y
selective harvesting - sets
harvest within set
record
record
record
r
e
p
o
s
i
t
o
r
y
S1
S2
What is an Open Archive ?
• Any WWW-based system that can be accessed through
the well-defined interface of the Open Archives
Protocol for Metadata Harvesting
• … aka OAI-Compliant Repository
• No implications for:
•
•
•
•
Physical storage of data
Cost of data
Metadata and data formats
Access control to server
Sample OAI Record
<record>
<header>
<identifier>oai:sigir:ws3</identifier>
<datestamp>2001-08-13</datestamp>
</header>
<metadata>
<dc>
<title>OAI Workshop at SIGIR</title>
<creator>Hussein Suleman</creator>
<language>English</language>
</dc>
</metadata>
<about>
<metadataID>oai:sigir:ws3md</metadataID>
</about>
</record>
Sets
• Protocol mechanism to allow for harvesting
of sub-collections
• No well-defined semantics – depends
completely on local data providers
• May be defined by arrangement between
data providers and service providers
• E.g., Subject areas, years, author names,
search queries
Protocol for Metadata Harvesting
• Service Requests
• Identify
• ListMetadataFormats
• ListSets
• GetRecord
• ListIdentifiers
• ListRecords
• Metadata Multiplicity
• Date Ranges
• Resumption Tokens
Example: Union Collection of ETDs
(Electronic Theses and Dissertations,
for Networked Digital Library of
Theses and Dissertations, NDLTD)
VIRTUA
MARIAN
Future: recommender, …
Merged Metadata
Collection
LEGEND
OAI Data Provider
Virginia
Tech ETD
Archive
Humboldt
ETD
Archive
Duisburg
ETD
Archive
…
OAI Service Provider
OAI Harvesting
Example: Details
Name Authority
Service
(e.g. OCLC)
NDLTD Central
VTLS Union
Catalog
NDLTD Site / Member
Librarian
Verification /
Validation /
Enrichment /
Maintenance
Student
Entry
OAI
Server
Local DB
MARIAN
Union
Catalog
Virtua
MARC DB
OAI
Harvester
Conversion
Local
Search /
Brow se
Alternate MARC
Transport (ftp?) tapes?)
Open Digital Library (ODL)
Hypothesis (Hussein Suleman)
• Can we leverage the successful model of the OAI
Protocol for Metadata Harvesting to alleviate our
architectural problems ?
Maybe … if
Digital Libraries can be modeled as
• networks of extended Open Archives, where
• each extended Open Archive is a
• source of data and/or a provider of services.
Example Architecture (NDLTD)
Virginia Tech
User Interface
PhysNet
Humboldt
Search
Browse
Recent
Duisburg
CalTech
Union Catalog
MIT Filter
MIT
legend
Dresden
User Interface
OAI/ODL archive
OAI/ODL protocol
ODL Demonstration - FrontPage
ODL Demonstration - Search
ODL Demonstration - Browse
Outline
• Case Study: NDLTD
•
•
•
•
Case Study: CSTC
Case Study: CITIDEL
Interoperability: OAI, ODL
Conclusions
Conclusions
• Digital libraries can help advance education.
• Singapore is invited to engage in NSDL, CITIDEL,
NDLTD, and other ventures.
• UNESCO Analytical Survey on Digital Libraries in
Education is recommending DLE in each nation.
• Local and national support can
•
•
•
•
stimulate activities, including collaboration
promote a sharing culture, especially in research and teaching
leverage others’ investments (networking, computing, …)
encourage / facilitate learning, innovation and problem solving
Selected Links
• CITIDEL
• www.citidel.org
• NCSTRL
• www.ncstrl.org
• NDLTD
• www.ndltd.org
• NSDL
• www.nsdl.org
• Virginia Tech Digital Library Courseware
• http://ei.cs.vt.edu/~dlib
• Virginia Tech Digital Library Research Laboratory (DLRL)
• http://www.dlib.vt.edu
• (5S, 5SL, AmericanSouth.Org, CSTC, ENVISION, MARIAN,
NSDL, OAI, ODL)
• Repository Explorer
• http://purl.org/net/oai_explorer
NDLTD,
More Links
• ARC Cross-Archive Search Service
• http://arc.cs.odu.edu/
• Dublin Core Metadata Initiative
• www.dublincore.org
• E-Prints DL-in-a-box
• www.eprints.org
• Open Archives Initiative
• http://www.openarchives.org
• http://www.openarchives.org/OAI/openarchivesprotocol.htm
• http://www.dlib.vt.edu/projects/OAI/
• XML Schema Validator
• http://www.w3.org/2001/03/webdata/xsv
• XML Tools at W3C
• http://www.w3.org/XML/#software
Download