View/Open

advertisement
The Open Archives
Initiative (OAI) and
Electronic Theses and
Dissertations (ETDs)
ASIDIC ‘2000
Orlando, FL - March 27, 2000
Edward A. Fox (fox@vt.edu)
http://fox.cs.vt.edu
Virginia Tech, Blacksburg, VA, USA
Acknowledgements (Selected)
 Sponsors: ACM, Adobe, ARL,
Belgian Science
Found., CLIR, DARPA, IBM, LANL, Microsoft,
NSF, OCLC, SPARC, US Dept. of Ed. (FIPSE), …
 VT Faculty/Staff: Tony Atkins, Thomas Dunbar,
John Eaton, Gwen Ewing, Peter Haggerty, Gary
Hooper, Gail McMillan, Len Peters, James Powell
 VT Students: Emilio Arce, Fernando Das Neves,
Brian DeVane, Robert France, Marcos Goncalves,
Scott Guyer, Robert Hall, Neill Kipp, Paul Mather,
Tim McGonigle, Todd Miller, Constantinos
Phanouriou, William Schweiker, Ohm Sornil,
Hussein Suleman, Patrick Van Metre, Laura Weiss
Virginia Tech Background
 Largest
university in Virginia, land-grant, football,
town population 35K plus 25K students
 Blacksburg Electronic Village, since 1992, with
> 80% of community on Internet
 Net.Work.Virginia, largest ATM network, with
over 750 sites, for education, research, gov’t
 LMDS, Local Multipoint Distribution Service,
gigabit wireless networking - 1/3 of Virginia
 Math Emporium, 500 workstations
 Faculty Development Initiative, round 2
Digital Libraries
Shorten the Chain from
Editor
Reviewer
Publisher
A&I
Consolidator
Library
DLs Shorten the Chain to
Author
Teacher
Digital
Reader
Editor
Reviewer
Learner
Librarian
Library
The Networked Digital Library of Theses and Dissertations
www.NDLTD.org
Training Authors
Expanding Access
Preserving Knowledge
Improving Graduate Education
Enhancing Scholarly Communication
Empowering Students & Universities
Leader of the Worldwide ETD
(Electronic Thesis and Dissertation) Initiative
Open Archives initiative
OAi
www.openarchives.org
openarchives@openarchives.org
OAi Philosophy
Self-archiving
= submission mechanism
Long-term storage system = archive
Open interface = harvesting mechanism
Data provider + service provider
Start with “gray literature”
– e-prints/pre-prints, reports, dissertations, …
Tiered Model of Interoperability
Mediator services
Metadata harvesting
Document models
Repository of Digital Objects
Repository
Access
Protocol
handle
terms and conditions
Digital object
Open Archives initiative History
 xxx
at LANL = Los Alamos National Laboratory
(Ginsparg) for high-energy physics - 1991
 CSTR + WATERS = NCSTRL (Lagoze) - 1994
 xxx + NCSTRL = CoRR collaboration - 1998
 UPS (Universal Preprint Service) – 1999 mtg
– Herbert Van de Sompel (U. Ghent, SFX) …
– Dublin Core (DC), XML
– Dienst protocol and software (Lagoze)
 Renamed
late 1999 as OAi
Open Archives (protoproto)
ArXiv
& Los Alamos National Lab
CogPrints & U. Southampton
NACA & NASA (reports)
NCSTRL & Cornell U.
NDLTD & Virginia Tech
RePEc & U. Surrey
Total of around 200K records
Original Open Archives Members
 Caroline Arms,
Library of Congress
 Leslie Carr, University of Southampton
 Mark Doyle, American Physical Society
 Dale Flecker, Harvard University
 Edward A. Fox, Virginia Tech
 Michael Friedman, HighWire Press, Stanford U.
 Paul M. Gherman, Vanderbilt U. & SPARC
 Paul Ginsparg, Los Alamos National Lab. & xxx
 Stevan Harnad, University of Southampton
 Thomas Krichel, University of Surrey & RePEc
 Carl Lagoze, Cornell University …
Original Open Archives Members
cont’d
 Rick
Luce, Los Alamos National Laboratory
 Clifford Lynch, Coalition for Networked Info.
 Kurt Maly, Old Dominion University
 Michael Nelson, NASA Langley Research Center
 John Ober, California Digital Library
 Bob Parks, Washington University & EconWPA
 Herbert Van de Sompel, University of Ghent
 Eric F. Van de Velde, Caltech
 Don Waters, The Andrew W. Mellon Foundation
 Ken Weiss, California Digital Library
Open Archives Future
 EconWPA (U.
Washington)
 e-biomed -> PubMed Central (NIH)
 PubScience (DOE)
 Clinical Medicine Netprints (+ other HighWire
Press holdings )
 University ePub (California Digital Library)
 All public e-prints (MIT)
 Scholar’s Forum (Caltech)
 Int’l: CERN, Germany, India, Mexico, …
 Goal: millions of books/articles/reports / yr
Approaches to Open Archives
Build By Institution
Build By
Discipline
Approaches to Open Archives
Build By Institution
Build By
Discipline
Author
Category
Interdisciplinary
Year
Language
Query …
Open Archives initiative (OAi)
www.openarchives.org
Santa Fe meeting, Oct. 21-22, 1999, protoproto
 Next mtg June 3, San Antonio, between HT’00 & DL’00
 LANL, CNI, DLF, Mellon, …
 Convention (see Feb. D-Lib Magazine article)
 Archives -> Open Archives

–
–
–
–

Support unique archive identifiers
Implement Open Archives Metadata Set (DC-based, using XML)
Implement Dienst harvesting interface
Register the archive
Build tools, layer other services: linking, searching, …
Figure 1. Layers Related to Open Archives Initiative
Services
Citation /
Linking
Authoring
Submission
SFX
Editorial:
CiteSeer
Reviewing,
Certification
Summarization
Metadata
Creation
Registry
Citation
Checking
Archives:
Text/MM
Editing
Citation DB
Updating
Name, ID,
Description,
Terms and
Conditions,
…
Authority
Control
Preservation
Conversion
Metadata Formats:
Gazetteer
Cataloging
Copy-Edit / Add Value
Name,
Standard,
Preservation
Process, …
Name, XML DTD, …
Search/Browse
Protocols
Annotation
Collaboration
Archive
Formats:
…
Services
Tools
…
Repository
Repository for NDLTD
Metadata Formats:
OA Metadata Set,
NDLTD Standard
(DC-based) Set
Transaction Log
Training Resources
Open Archives Harvesting Protocol
VT Partition
Record
(Metadata)
Record (Full
Content)
NCSTRL
Repository
UVA Partition
Metadata
…
Content
…
EconWPA
Repository
…
Caltech Partition
Metadata
Content
RePEc
Repository
Mechanisms
 Sharing
– Join federation, run software
– Make metadata and archive available
 Aggregating
– By discipline
– By institution
– By genre
 Automating
– Workflow
– Harvesting and providing services
– Federated searching
– Dynamic linking (e.g., with SFX)
Report on Open
Archives work in
progress at
Virginia Tech
With students:
Hussein Suleman (hussein@vt.edu)
Dave Watkins (dwatkins@cs.vt.edu)
Robert France (france@vt.edu)
Marcos Andre Goncalves (mgoncalv@cs.vt.edu)
VT View of the
Open Archives initiative (OAi)
Enable
sharing of publication metadata
and full-text by digital libraries
Standardize low-level mechanisms to
share contents of libraries
Build higher-level user-centric and
administrative services in meta-libraries
Install organizational mechanisms to
support the technical processes
Virginia Tech Projects
 MARC
XML-DTD
 Computer
 W3C
 OAi
Science Teaching Centre (CSTC)
Web Characterization Repository
Repository Explorer
 Networked
Digital Library of Theses and
Dissertations (NDLTD)
MARC XML-DTD
 XML Transport
format for US-MARC
records
 Standardized
metadata exchange format
for traditional library services joining OAi
CS Teaching Center (CSTC)
 Collection
of reviewed online resources used
to aid in teaching of Computer Science
 Supports
author submission and peer-review
process for new ACM Journal of Educational
Resources In Computing (JERIC)
 Connected
with NSDL (NSF 00-44)
 http://www.cstc.org
W3C Web Characterization
Repository
 Online
database of metadata related to
publications, tools and data sets dealing with
Web characterization
 Project
of the Web Characterization Activity
working group of the World-Wide-Web
Consortium (www.w3c.org/WCA)
 http://purl.org/net/repository
OAi Repository Explorer
 Serves
as a compliancy test
 Allows browsing of open archives using
only OAi protocol
 Sends requests on behalf of user, parses and
checks responses and displays browsable
interface
 Will detect most discrepancies in protocol
 http://purl.org/net/explorer
NDLTD
 Work
has begun on interoperability between
Virginia Tech and partners in Germany
 Wrappers
have been created to harvest data
from remote sites which use other protocols
 Harvested
data to be stored in a central OAicompliant database (work in progress)
Grad
Program
Library
IT
Ed
Tech
A Digital Library Case Study
Domain:
graduate
education, research
Genre:ETDs=electronic
theses & dissertations
Submission:
http://etd.vt.edu
Collection:
http://www.theses.org
Project:
Networked Digital
Library of Theses &
Dissertations
(NDLTD) http://
www.ndltd.org
with 225 people at
3rd Intl Symposium,
March 2000
What are we doing?
 Aiding
universities to enhance graduate
education, publishing and IPR efforts
 Helping
improve the availability and
content of theses and dissertations
 Educating ALL future
scholars so they can
publish electronically and effectively use
digital libraries (i.e., are Information
Literate and can be more expressive)
Key Ideas:
Scalability
Networked infrastructure
University collaboration
Workflow, automation
Education is the rationale
Maximal
Access
8th graders vs. grads
Authors must submit
Standards
PDF, SGML, MM,
MARC, DC, URNs,
Federated search
Student Defends & Finalizes ETD
My Thesis
ETD
Student Gets Committee
Signatures and Submits ETD
Signed
Grad School
Graduate School Approves ETD,
Student is Graduated
Ph.D.
Library Catalogs ETD, Access is
Opened to the New Research
WWW
NDLTD
User Search Support
(multilingual, XML)
NDLTD World Federated
Search
User
Interface
Virginia Tech ...
(univ)
Dissertations
Online
(Germany)
OhioLink
(lib / univ group)
Portugese NL ...
(national lib)
Australia
(regional)
OAS,
ISTEC
(Latin
America)
Note: All groups shown are connected with NDLTD.
www.theses.org
 James
Powell student project, D-Lib
Magazine description in Sept. 1998
 XML description of each site
– type of search engine / service
– language
– coverage (for resource discovery)
 Adding Z39.50 gateway capability and
integrating with MARIAN, along with Harvest
and Open Archives protocols
Access Possibilities
Web
search
engines
www.
theses.
org
Virginia MIT National
Tech
Library of
Portugal
www.
library
openarchives. catalog
org
clients
CBUC
(Spain)
Ohio
Link
3rd
Party
Services
(e.g.,
UMI)
National
Projects:
AU, GE, …
PetaPlex
 Digital
Library Machine (“super”
object store)
 Parallel
computer / storage utility
 Knowledge Systems Incorporated is
supplying VT-PetaPlex-1 with
– high speed backbone connection
– 2.5 terabytes through 100 nodes:
Net connection + 25GB disk +
233 MHz Pentium + Linux
How does this relate to UMI?
 1987
UMI workshop to explore ETDs
 Support
letter for US Dept. of Ed. proposal
 Steering
committee membership
 ProQuest
Direct pilot of scanning works
started 1/1/97, free 2 yr access to front part
 Collaborating
–
–
on:
accepting electronic author submissions
standards (e.g., representation)
ETD Initiative (and UMI)
Students
Learn about
DL, EPub
TDs
become more
expressive
Global TDs
become more
accessible,
archived
Universities
UMI
N. Amer. (T)Ds are
accessible, archived
0
Date Joined
11/11/99
9/11/99
7/11/99
5/11/99
3/11/99
1/11/99
11/11/98
9/11/98
7/11/98
5/11/98
3/11/98
1/11/98
11/11/97
9/11/97
7/11/97
5/11/97
3/11/97
Number of Members
NDLTD Members
80
70
60
50
40
30
20
10
US University Members (41)
Air
University (Alabama)
Baylor University
Brigham Young University
Caltech
Clemson University
College of William & Mary
Concordia University (Illinois)
East Carolina University
East Tenn. State U. – require fall 2000
Florida Institute of Tech.
Florida International University
George Washington University
Marshall University (W. Va.)
Miami U. of Ohio
MIT
Michigan Tech
Naval Postgraduate School (CA)
North Carolina State U.
Penn. State University
Rochester Institute of Tech.
U. of Colorado Health Science Center




















U. of Florida
U. of Georgia
University of Hawaii, Manoa
U. of Iowa
U. of Kentucky
U. of Maine
U. of North Texas – required since 8/99
U. of Oklahoma
U. of South Florida
U. of Tennessee, Knoxville
U. of Tennessee, Memphis
U. of Texas at Austin
U. of Virginia
U. Wisconsin - Madison
Vanderbilt U.
Virginia Commonwealth U.
Virginia Tech - required since 1/97
West Virginia U. - required fall 1998
Western Michigan U.
Worcester Polytechnic Inst.
Institutional Members
Coalition for Networked Information (CNI)
 Committee on Institutional Cooperation (CIC)
 Diplomica.com
 Dissertation.com
 Dissertationen Online (Germany)
 Ibero-American Science & Technology Education
Consortium (ISTEC, www.istec.org)
 National Library of Portugal (for all universities)
 Organization of American States (SEDI/OAS)
 UNESCO (www.unesco.org/webworld/etd)

Australian Project Members
 U.
New South Wales (lead institution)
 U. of Melbourne
 U. of Queensland
 U. of Sydney
 Australian National University
 Curtin U. of Technology
 Griffith U.
German Project Members
 Humboldt
University (lead institution)
 3 other universities
 5 learned societies
– Mathematics, Physics, Chemistry,
Sociology, Education
 1 computing center
 2 major libraries
CBUC (www.cbuc.es, Spain)
 Consorci
de Biblioteques Universitàries de
Catalunya, as group, with 9 members:
–
–
–
–
–
–
–
–
–
Universitat de Barcelona
Universitat Autonòma de Barcelona
Universitat Politècnica de Catalunya
Universitat Pompeu Fabra
Universitat de Girona
Universitat de Lleida
Universitat Rovira i Virgili
Universitat Oberta de Catalunya
Biblioteca de Catalunya
Other International Members
Chinese
University of Hong Kong
Chungnam National U. (S. Korea - CS)
City University, London (UK)
Darmstadt U. of Tech. (Germany)
Free University of Berlin (GE - Vet. Med.)
Gyeongsang National U. (Korea)
India Institute of Tech., Bombay (India)
Nanyang Technological U. (Singapore, pt)
National U. of Singapore (Singapore, pt)
Other International Members
cont’d
Polytechnic
University of Valencia (Spain)
Rhodes U. (South Africa)
St. Petersburg St. Tech.U (Russia)
Univ. de las Américas Puebla (Mexico)
Univ. of Alicante (Spain)
Univ. of Pisa (Italy)
U. Laval; U. of Guelph; U. Waterloo;
Wilfrid Laurier U. (Canada), …
What are the long term goals?
 400K
US students / year getting grad degrees are
exposed / involved
 200K/yr rich hypermedia ETDs that may turn into
electronic portfolios (images, video, audio, …)
 Dramatic increase in knowledge sharing: literature
reviews, bibliographies, …
 Services providing lifelong access for students:
browse, search, prior searches, citation links
 Hundreds/thousands of downloads / year / work
For professional societies
 Like
“writing across the curriculum”, e.g.,
Chemical Markup Language, MathML, …
 Besides writing: computing/communications,
information literacy, personal digital library
management, tool use, research methods,
collaboration, archiving/preservation
 Data sets, communities of users of them
 Classification systems / browsing / searching
Extending Services - 1 of 2
 Working
with publishers
– Motivate students: awards, …
– Publicize support of NDLTD
ACM, ACS, IEEE-CS, Elsevier, …
– Allow students to increase level of access
 Arranging preservation
– Mirroring worldwide
– Involving long-term trusted parties
Extending Services - 2 of 2
Adding
services currently prototyped
– annotation and SDI (routing) capabilities
– Dublic Core metadata, crosswalk to MARC
– support for XML, *ML, preservation
– harvesting, federated search
Adding
other services planned
– building/using citation DB (CiteSeer, SFX, …)
– implementing plagiarism check (like “SCAM”)
Remember!
Digital
Libraries (technology base)
OAi
(help establish enormous international
cooperative of data and service providers)
NDLTD
- improve graduate education
– www.ndltd.org/join
– (www.ndltd.org/talks for this)
Download