20060824IBICTp2 - Edward A. Fox

advertisement
Symposium: Open Access to Information
Panel 2: Open Access & Institutional Repositories
24 August 2006, Brasilia
Digital Libraries, Electronic Theses and
Dissertations (ETDs), and NDLTD
http://fox.cs.vt.edu/talks/2006/20060824IBICTp2
Edward A. Fox, fox@vt.edu
Executive Director, NDLTD
Chair, IEEE-CS Tech. Committee on Digital Libraries
Professor, Department of Computer Science
Director, Digital Library Research Laboratory
Virginia Tech, Blacksburg, VA 26061 USA
1
Outline
•
•
•
•
•
•
•
•
Key Ideas
Acknowledgements
Digital Libraries
DLs & Scholarly Communication
Institutional Repositories
NDLTD
Summary
DL Futures
2
Key Ideas - Overview
• Theorem 1: Supporters of Open Access
should support NDLTD.
• Theorem 2: 5S can guide us to better
support of Open Access.
3
Acknowledgements
•
•
•
•
•
Students
Faculty, Staff
Collaborators
Support
Mentors
4
Acknowledgements: Students
• Pavel Calado, Yuxin Chen, Fernando Das
Neves, Shahrooz Feizabadi, Robert
France, Marcos Gonçalves, Nithiwat
Kampanya, S.H. Kim, Aaron Krowne, Bing
Liu, Ming Luo, Paul Mather, Fernando
Das Neves, Unni. Ravindranathan, Ryan
Richardson, Rao Shen, Ohm Sornil,
Hussein Suleman, Ricardo Torres, Wensi
Xi, Baoping Zhang, Qinwei Zhu, …
5
Acknowledgements: Faculty, Staff
• Lillian Cassel, Debra Dudley, Roger
Ehrich, Joanne Eustis, Weiguo Fan,
James Flanagan, C. Lee Giles, Eberhard
Hilf, John Impagliazzo, Filip Jagodzinski,
Rohit Kelapure, Neill Kipp, Douglas
Knight, Deborah Knox, Aaron Krowne,
Alberto Laender, Gail McMillan, Claudia
Medeiros, Manuel Perez, Naren
Ramakrishnan, Layne Watson, …
6
Other Collaborators (Selected)
•
•
•
•
•
•
•
•
•
•
Brazil: FUA, IBICT, UFMG, UNICAMP, USP
Case Western Reserve University
Emory, Notre Dame, Oregon State
Germany: Humboldt U., U. Oldenburg
Mexico: UDLA (Puebla), Monterrey
College of NJ, Hofstra, Penn State, Villanova
University of Arizona
University of Florida, Univ. of Illinois
University of Virginia
VTLS (slides on digital repositories, NDLTD)7
Acknowledgements: Support
• Course: UNESCO, CETREDE, IFLALAC, AUGM, CLEI, UFC
• Sponsors: ACM, Adobe, AOL, CAPES,
CNI, CONACyT, DFG, IBM, Microsoft,
NASA, NDLTD, NLM, NSF (IIS-9986089,
0086227, 0080748, 0325579, 0535057;
ITR-0325579; DUE-0121679, 0136690,
0121741, 0333601), OCLC, SOLINET,
SUN, SURA, UNESCO, US Dept. Ed.
(FIPSE), VTLS
Acknowledgements - Mentors
• JCR Licklider – undergrad advisor (1969-71)
– Author in 1965 of “Libraries of the Future”
– Before, at ARPA, funded start of Internet
• Michael Kessler – BS thesis advisor
– Project TIP (technical information project)
– Defined bibliographic coupling
• Gerard Salton – graduate advisor (1978-83)
– “Father of Information Retrieval”
9
Digital Libraries
•
•
•
•
Definitions
DL Manifesto – Reference Model
Book in process (Fox & Gonçalves), 5S
DL Curriculum Project
10
DL Definitions - 1
• “A digital library is an organized and
focused collection of digital objects,
including text, images, video, and audio,
along with methods of access and
retrieval, and for selection, creation,
organization, maintenance, and sharing of
the collection.”
• Witten & Bainbridge – “How to Build a
Digital Library” – Morgan Kaufmann 2003
11
DL Definitions - 2
• “Digital libraries are organizations that
provide the resources, including the
specialized staff, to select, structure, offer
intellectual access to, interpret, distribute,
preserve the integrity of, and ensure the
persistence over time of collections of
digital works so that they are readily and
economically available for use by a defined
community or set of communities”
• Waters,D.J. CLIR Issues, July/August 1998
• www.clir.org/pubs/issues/issues04.html 12
DL Definitions - 3
• Issues and Spectra
– Collection vs. Institution
– Content vs. System
– Access vs. Preservation
– “Free” vs. Quality
– Managed vs. Comprehensive
– Centralized vs. Distributed
13
DL Definitions - 4
• NOT a “digitized library”
• NOT a “deconstruction” of existing
systems and institutions, moving them to
an electronic box in a Library
• IS a new way to deal with knowledge
– Authoring, Self-archiving, Collecting,
– Organizing, Preserving,
– Accessing, Propagating, Re-using
14
Digital Library Content
Content
Types
Text
Documents
Video
Audio
Geographic
Information
Software,
Programs
Bio
Information
Images and
Graphics
Articles,
Reports,
Books
Speech,
Music
(Aerial)
Photos
Models
Simulations
Genome
Human,
animal,
plant
2D, 3D,
VR,
CAT
15
DL Manifesto - 1
• DL Reference Model
• In support of the future European Digital Library
• Developed by team connected with DELOS
(Candela, Casteli, Ioannidis, Koutrica, Meghini,
Pagano, Ross, Schek, Schuldt)
• Draft 2.2 presented in Frescati, near Rome,
June 2006 – 79 pages
• Could be integrated with work of DLF, JISC, etc.
16
DL Manifesto – 2: 3 Tiers
17
DL Manifesto – 3: Main
Concepts
18
DL Manifesto – 4: Actor Roles
19
Fox & Gonçalves DL Book Parts
• Ch. 1. Introduction (Motivation, Synopsis)
•
•
•
•
Part 1 – The “Ss”
Part 2 – Higher DL Constructs
Part 3 – Advanced Topics
Appendix
20
Book Parts and Chapters - 1
• Ch. 1. Introduction (Motivation, Synopsis)
• Part 1 – The “Ss”
– Ch. 2: Streams
– Ch. 3: Structures
– Ch. 4: Spaces
– Ch. 5: Scenarios
– Ch. 6: Societies
21
Informal 5S & DL Definitions
DLs are complex systems that
•
•
•
•
•
help satisfy info needs of users (societies)
provide info services (scenarios)
organize info in usable ways (structures)
present info in usable ways (spaces)
communicate info with users (streams)
22
A Minimal DL in the 5S Framework
Streams
Structured
Stream
Structures
Spaces
Structural
Metadata
Specification
Scenarios
Societies
services
Descriptive
Metadata
Specification
indexing
browsing searching
hypertext
Digital Object
Collection
Metadata Catalog
Repository
Minimal DL
23
Book Parts and Chapters - 2
• Part 2 – Higher DL Constructs
– Ch. 7: Collections
– Ch. 8: Catalogs
– Ch. 9: Repositories and Archives
– Ch. 10: Services
– Ch. 11: Systems
– Ch. 12: Case Studies
24
Book Parts and Chapters - 3
• Part 3 – Advanced Topics
– Ch. 13: Quality
– Ch. 14: Integration
– Ch. 15: How to build a digital library
– Ch. 16: Research Challenges, Future Perspectives
• Appendix
– A: Mathematical preliminaries
– B: Formal Definitions: Ss
– C: Formal Definitions: DL terms, Minimal DL
– D: Formal Definitions: Archeological DL
– E: Glossary of terms, mappings
25
RELATED
TOPICS
CORE DL
TOPICS
COURSE
STRUCTURE
DL Curriculum Framework
Semester 1:
DL collections:
development/creation
Digitization
Storage
Interchange
Metadata
Cataloging
Author
submission
Digital objects
Composites
Packages
Semester 2:
DL services and
sustainability
Architectures
(agents, buses,
wrappers/mediators)
Interoperability
Spaces
(conceptual,
geographic,
2/3D, VR)
Documents
E-publishing
Markup
Multimedia
streams/structures
Capture/representation
Compression/coding
Bibliographic
information
Bibliometrics
Citations
Content-based
analysis
Multimedia
indexing
Naming
Repositories
Archives
Services
(searching,
linking,
browsing, etc.)
Archiving and
preservation
Integrity
Architectures
(agents, buses,
wrappers/mediators)
Interoperability
Thesauri
Ontologies
Classification
Categorization
Multimedia
presentation,
rendering
Info. Needs
Relevance
Evaluation
Effectiveness
Intellectual property
rights mgmt.
Privacy
Protection (watermarking)
Routing
Filtering
Community
filtering
Search & search strategy
Info seeking behavior
User modeling
Feedback
Info
summarization
Visualization
26
Project Teams/NSF Grant
• Project Team at VT (IIS-0535057):
– PI: Dr. Edward A. Fox (fox@vt.edu)
– GRA: Seungwon Yang (seungwon@vt.edu)
• Project Team at UNC-CH (IIS-0535060):
– Co-PI: Dr. Barbara Wildemuth
(wildem@ils.unc.edu)
– Co-PI: Dr. Jeffrey Pomerantz
(pomerantz@unc.edu)
– GRA: Sanghee Oh (shoh@email.unc.edu)
27
DLs & Scholarly Communication
•
•
•
•
•
•
Asynch
Information Life Cycle
Flattening
Author skills, toward Semantic Web
Crossing the Chasm
OAI
28
Asynchronous, Digital Library
Mediated Scholarly Communication
Different time and/or place
29
Information Life Cycle
Authoring
Modifying
Using
Creating
Retention
/ Mining
Organizing
Indexing
Accessing
Filtering
Storing
Retrieving
Distributing
Networking
30
Digital Libraries
Shorten the Chain from
Editor
Reviewer
Publisher
A&I
Consolidator
Library
31
DLs Shorten the Chain to
Author
Teacher
Digital
Reader
Editor
Reviewer
Learner
Library
Librarian
32
Important skills for authors
•
•
•
•
•
•
•
Authoring (Word Processing ->e-pub)
Rendering, presenting
Tagging, Markup (XML, SGML)
“Semi-structured information”
Dual-publishing, eBooks
Styles (XSL, XSLT)
Structured queries
33
34
35
36
37
OAI – Repository Perspective
Required: Protocol
MDO
MDO
MDO
MDO
MDO
MDO
MDO
MDO
DO
DO
DO
DO
38
OAI – Black Box Perspective
OA 7
OA 4
OA 2
OA 1
OA 3
OA 6
OA 5
39
The World According to OAI
Service Providers
Discovery
Current
Awareness
Preservation
Data Providers
40
Institutional Repositories
•
•
•
•
•
•
Definitions, Goals
Eprints
DSpace
Fedora, VITAL
Comparisons
ODL + 5S Suite (not shown)
41
Institutional Repositories - 1
• “Institutional repositories are digital
collections that capture and preserve the
intellectual output of a single university or
a multiple institution community of colleges
and universities.”
• Crow, R. “Institutional repository checklist
and resource guide”, SPARC, Washington,
D.C., USA
• www.arl.org/sparc/IR/IR_Guide_v1.pdf
42
Institutional Repositories - 2
• “A university-based institutional repository is a set
of services that a university offers to the members
of its community for the management and
dissemination of digital materials created by the
institution and its community members. It is most
essentially an organizational commitment to the
stewardship of these digital materials, including
long-term preservation where appropriate, as well
as organization and access or distribution.”
• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7,
Feb. 2003, www.arl.org/newsltr/226/ir.html
43
What is a
Digital Object Repository?
Also called: digital rep., digital asset rep.,
institutional repository
Stores and maintains digital objects (assets)
Provides external interface for Digital
Objects
Creation, Modification, Access
Enforces access policies
Provides for content type disseminations
Adapted from Slide by V. Chachra, VTLS
44
Goals of Institutional Repositories
(by Steven Harnad, U. Southampton)
 Self Archiving of Institutional Research
Thesis and Dissertations (VTLS NDLTD Project)
Article preprints and post prints
Internal documents and maps
 Management of digital collections
 Preservation of materials – decentralized approach
 Housing of teaching materials
 Electronic Publishing of journals, books, posters,
maps, audio, video and other multimedia objects
Adapted from Slide by V. Chachra, VTLS
45
46
47
48
49
50
51
52
53
What is Fedora™?
Flexible Extensible Digital Object
Repository Architecture
• Slides courtesy Vinod Chachra of VTLS
54
History of Fedora™
• 1997-Present
– DARPA and NSF-funded research project at Cornell
(Conceptual framework developed by Sandra Payette and Carl
Lagoze)
– Reference implementation developed at Cornell
• 1999-2001
– University of Virginia digital library prototype (Thornton
Staples and Ross Wayland)
• 2002-Present
– Andrew W. Mellon Foundation granted Virginia and Cornell $1
million to develop a production-quality Fedora system
– Fedora 1.0 released in May 2003 as Open Source under the
Mozilla public license.
55
Fedora™ Terms
Metadata
Digital Objects (data)
Complex Objects (Object consisting of many
objects in a complex/hierarchical relationship)
Content (Data and Metadata together)
Data-streams (are content for dissemination)
Disseminators (are services) – A
dissemination is defined as a stream of data
that manifests a view of the digital objects
56
content.
Digital Object w. multiple datastreams
Digital Object
DC
Datastreams
Datastreams
EAD
Admin
Metadata
EA
D
57
Example Disseminators
Persistent ID (PID)
Disseminators
Default
Get Profile
List Items
Get Item
List Methods
Get DC Record
Simple Image
System Metadata
Datastreams
Get Thumbnail
Get Medium
Get High
Get VeryHigh
58
Client
Application
Fedora™
Repository
Batch
Program
Web
Browser
HTTP SOAP
HTTP SOAP
HTTP SOAP
Manage
Access
Search
Server
Application
Web Service
Web Service
Exposure
Exposure
Layer
Layer
HTTP
OAI Provider
Session Management
User Authentication
Management
Subsystem
Security
Subsystem
Access
Subsystem
Policy Mgmt
Object Reflection
Component Mgmt
Policy Enforcement
Object Dissemination
HTTP
Object Validation
Users/Groups
PID Generation
External
Content
Source
HTTP
FTP
External Content
Retriever
Digital Objects
XML Files
Datastreams
HTTP
Local
Service
Policies
Storage Subsystem
FT P
External
Content
Source
SOAP
Object Mgmt
Remote
Service
Content
Relational DB
Adapted from Slide by V. Chachra, VTLS
59
Fedora Advantage
• Extensible digital object model
• Repository exposed by Web services APIs
– Management (Creation, Deletion, Maintenance,
Validation)
– Access (Search, Disseminations)
• Scalable, persistent storage for content and
metadata
• Content can be local and/or remote
• Content versioning
• Open source solution
60
Comparison of DSpace and Fedora
 Dspace is a standalone product in a box whereas
Fedora can be standalone or integrated with ILS
 In Fedora the metadata and the content are treated
the same way as data-streams; in Dspace the
metadata and content get separate treatments.
 Fedora can define complex objects easier
 Dspace is not as extensible as Fedora as it deals
both with the repositories and workflows. Fedora
focuses only on the data model.
 Fedora uses the Mozilla licensing model and
Dspace uses GNU license. It makes it easier for
software companies to provide extensions to the
61
model.
VITAL / Fedora Relationship
62
Prospero: Summary of features of the three
software packages compared
DSpace
E-prints
Fedora
What you get
A package with front-end
web interface
directly linked to a
database
A package with front-end
web interface directly
linked to a database
A repository database, with
internal database.
Server requirements
Unix environment, Java,
Apache Ant, Apache
Tomcat,
PostgreSQL or
Oracle
Unix environment, Perl,
Apache+mod-perl,
MySQL
Unix or Windows, Java.
(optional: MySQL or
Oracle)
Subject classification
Yes
Yes
Yes
Community
groups
Yes
No
Possible but … (see below)
Where from?
MIT and HewlettPackard.
Southampton University,
outcome of a JISC
project.
Cornell University and the
University of Virginia
Library.
63
64
65
66
67
NDLTD
•
•
•
•
•
•
•
DL case study
Goals
How, Workflow
Union Catalog
Services atop the Union Catalog
Sustainability and Impact
UK related report (Aug. 2006)
68
A Digital Library Case Study
• Domain: graduate
education, research
• Genre:ETDs=electronic
theses & dissertations
• Submission:
http://etd.vt.edu
• Collection:
http://www.theses.org
Project:
Networked Digital
Library of Theses
& Dissertations
(NDLTD)
http://www.ndltd.org
NDLTD Goals
• For Students:
– Gain knowledge and skills for the Information Age,
especially about Digital Libraries
– Richer communication (digital information, multimedia,
…)
• For Universities:
– Easy way to enter the digital library field and benefit
thereby
• For the World:
– Global digital library – large, useful, many services
70
NDLTD: How can a
university get involved?
• Select planning/implementation team
–
–
–
–
Graduate School
Library
Computing / Information Technology
Institutional Research / Educ. Tech.
• Join online, give us contact names
– www.ndltd.org/join
• Adapt Virginia Tech or other proven approach
– Build interest and consensus
– Start trial / allow optional submission
Student Gets Committee
Signatures and Submits ETD
Signed
Grad School
Library Catalogs ETD, Access is
Opened to the New Research
WWW
NDLTD
Union catalog: OCLC
• OCLC will expand OAI data provider on
TDs.
• Is getting data from WorldCat (so, from
many sites!).
• Will harvest from all others who contact
them.
• Need DC and either ETD-MS or MARC.
• Has a set for ETDs.
74
75
76
ETD Union Search Mirror Site in China (CALIS)
(http://ndltd.calis.edu.cn – popular site!)
77
78
VTLS Union Catalog
Content Languages

The VTLS NDLTD Union Catalog has data in 6
different languages. These are:
 English
 German
 Greek
 Korean
 Portuguese
 Spanish

Examples follow
79
Full-text Services
• Running since Sept 2005: Scirus
• In beta test: Google Scholar
• Challenges:
– Data quality problems
– Inconsistency in way to get from metadata to
the full-text file(s)
– Broadening the coverage since OAI use has
not spread as widely as we would like
80
81
What are we doing?
• Aiding universities to enhance
graduate education, publishing and
IPR efforts
• Helping improve the availability and
content of theses and dissertations
• Educating ALL future scholars so they
can publish electronically and
effectively use digital libraries (i.e., are
Information Literate and can be more
expressive) -> support Open Access
UK Report of Aug. 2006
• EVALUATION OF OPTIONS FOR A UK
ELECTRONIC THESIS SERVICE
• Study report edited by Alma Swan
• Key Perspectives Ltd & UCL Library Services
• EThOS project (Electronic Theses Online
Service) - commissioned to develop a model
for a workable, sustainable and acceptable
national service for the provision of open
access to electronic doctoral theses.
83
EThoS: Stakeholders
• Academic registrars
• University administrators (graduate
schools)
• Librarians
• Repository managers (3; 2)
• Authors (or potential authors) of theses
and dissertations
84
Assessment of the organisational models
Distributed model
Centralised model Mixed architecture
model
Viability
Dependent upon individual
institutions’ capabilities and
resources, which are highly
variable
Good, providing service provider
selects correct business model
and satisfies HEI concerns on
rights, liabilities, etc)
Good, providing service provider
selects correct business model
and satisfies HEI concerns on
rights, liabilities, etc)
Disadvantages
Dependent upon individual
institutions’ capabilities and
resources, which are highly
variable. This would lead to a
service of patchy quality for at
least a decade
Potentially chaotic with respect to
standards and consistency levels
HEIs lose control to an extent
and may lose some benefits in
terms of PR and other
institutional-purpose benefits
that accrue with local service
provision
Offers potential for
inconsistencies unless wellmanaged by hub provider
Advantages
Self-organising, cheap, simple
HEIs need only to provide
access to e-theses: central
service provider does the rest:
Standards applied across the
board:
Guaranteed consistent access:
Scope for added-value services:
One interface; a true national
collection as well as a national
gateway:
Easy to hook up to other
national or international
services.
Gives the greatest flexibility to
HEIs to select the most
appropriate options; HEIs can
retain control of selected
elements:
Standards applied across the
board:
Guaranteed consistent access:
Scope for added-value services:
One interface (multiple sites of
supply):
National gateway:
Easy to hook up to other national
or international services.
HEI community views
Strong feeling against this option
Second most popular option
Highest level of support for this
option
Comments
No support in the HEI community
Strong support within HEI
community
Very strong support within HEI
community
85
EThoS Survey: familiar with IPR
issues related to e-theses
•
•
•
•
8% know very little
30% not very familiar
51% familiar
11% very familiar
86
EThoS Survey: my institution’s
handling of PhD e-theses
•
•
•
•
83% not yet
11% from some students
5% from most students
1% from all students
87
EThoS Survey: my institution’s
policy position on PhD e-theses
• 55% no policies yet
• 34% current planning policies
• 11% has a policy
88
EThoS: Benefits
• Hugely increased visibility of UK doctoral
research output
• Resulting in increased usage and impact
of UK doctoral research output
• The opportunities for resulting new
research efforts and collaborations
89
Summary: Key Ideas
• Theorem 1: Supporters of Open Access
should support NDLTD.
• Theorem 2: 5S can guide us to better
support of Open Access.
90
Theorem 1: Supporters of Open
Access should support NDLTD - 1
• DLs will lead to enormous benefit at all
levels, from personal to global.
• An IR is a type of DL, in the middle of the
levels (requiring support from below, and
providing support for above levels).
• Having a DL at every university (i.e., IR)
greatly encourages Open Access.
91
Theorem 1: Supporters of Open
Access should support NDLTD - 2
• The easiest way to launch an IR at a
university is with ETDs.
• NDLTD is the lead world organization
promoting ETD activities.
• NDLTD’s goals are all in support of Open
Access and IRs.
92
Theorem 2: 5S can guide us to
better support of Open Access - 1
• 5S helps us think formally about Open
Access, hence clearly, hence to find focus.
• 5S helps us design and build DLs, hence IRs.
• Societies
– Individuals: members of institution, discipline
– Social influence can promote DL (re)use.
– Economic and political and social issues lead us
to a distributed architecture.
93
Theorem 2: 5S can guide us to
better support of Open Access - 2
• Distributed infrastructure + services lead us to
harvesting (vs. federation, gathering).
• 5S helps make harvesting a success:
– Streams of content flow from individuals.
– Structures: ETD-ms, (browsing) classification
– Spaces: indexes, interfaces
– Scenarios: submission, workflow, harvesting
– Societies (see above)
• More collaboration (social networks)
• Prestige is more widely spread.
• Access if more open
94
DL Futures
•
•
•
•
•
•
History
People, Content, Tools
Sustainable Infrastructure
Future Work
Links
For More Information
95
96
97
98
People
•
•
•
•
•
•
•
Digital librarians
DL system developers
DL system administrators
DL managers
DL collection development staff
DL evaluators
DL users
99
Download