Architecting an Extensible Digital Repository

advertisement
Architecting an Extensible Digital
Repository
Anoop Kumar, Ranjani
Saigal,Rob Chavez, Nikolai
Schwertner
Tufts University, Medford, MA
Overview




Background Information on the evolution of
TDL
Design Requirements
TDL Architecture
Applications that interface with TDL
–
–
Tufts DL search
VUE
History of Digital Collections at Tufts

About Tufts
–
–

Interdisciplinary
Focus on teaching and learning
Digital Collections at Tufts
–
–
–
–
Perseus (Classics)
Tufts University Science Knowledgebase (TUSK-Medicine)
Artifact (Art History)
Digital Collections and Archives (DCA)

–
Bolles, etc
Other (Crime and Punishment)
Projects
Materials
Tools
Perseus DL
50 million words, highly structured
TEI encoded XML texts of many
types.
50,000 images
Perseus document
management system and
tools
DCA
13 million words,
35,000 images,
geospatial datasets
multimedia objects
Perseus document
management system and
tools
TUSK
15,000 documents
Includes full-text syllabi, digital slide
images, lecture recordings (audio and
video) and text notes and exam
questions, evaluation forms, and
bibliographies linked to full-text
articles.
Networked course
management system
interface
Artifact
2500 images links to the Art History
slide collection database containing
120,000 entries.
On-demand viewing and
searching with Internetbased adaptations of
traditional learning aids,
such as flashcards, for
review and study
Why TDL?
(Tufts Digital Library)


The collections were continuously expanding
adding content in a variety of formats. The
architecture of these libraries was not built to
accommodate such expansion.
Needed a university wide digital repository
that can manage the ever increasing content
while continuing to service the discipline
specific needs and leveraging existing and
new tools and service
Designing TDL



Digital Collections and Archives partnered with
Academic Technology to create a digital library that
can manage the content while supporting teaching
and learning.
Commitment to comply with standards in the library
and the open source community.
Ensure Scalability, Flexibility, Reusability,
Extensibility and Interoperability
Design Requirements

Ingest:
–

Management:
–
–

Ability to enforce archival
standards
Use of information
packages to facilitate
storage and dissemination
Ability to incorporate
content models
Persistence:
–
–
Use of persistent identifiers
mapped URNs
Requirements
System Services
Unique and persistent
identification of materials
Naming Service
Use of archival
information packages
(AIP)
Digital Object Provider
(DOP) Service -- Fedora
Use of submission
information Packages
(SIP)
Drop Box, Ingestion
Service
Use of Dissemination
Information Packages
(DIP)
DOP Service
Authentication and
integrity checking
DOP Service
Dissemination
Disseminators, Caching
Service, Digital Library
Application, Search
Service
Access
Search Service and
other applications
Tufts DL Architecture
A
M
A
Fedora
Client
Application
Creation
Service
Application
Data
U
Application
Interface
Drop Box
Fedora
Search
Interface
Naming Service
Fedora
Ingestion
Service
U
Search
Indexing
Service
Search
Index
U - Users
M - Manager
A - Administrators
Component
Role
Drop Box and Ingestion
Validation, Tagging, Preprocessing,
Components of
TDL
Service
Ingestion
Naming Service
Unique persistent identifiers mapped to
objects
(“tufts:dca:central:MS102.33.1345”)
Fedora Repository
Management and access framework for
digital objects
Search and Indexing
Service
Provides search mechanism
Application Creation
Service
Provides mechanism for external
applications to interface with repository
TDL Architecture





Drop Box and Ingestion Service
Naming Service
Fedora Repository Service at Tufts
Indexing Service and Search Engine
Application Creation Service
Drop Box and Ingestion Service
TDL Architecture





Drop Box and Ingestion Service
Naming Service
Fedora Repository Service at Tufts
Indexing Service and Search Engine
Application Creation Service
Naming Service


Assigns, reserves and resolves URNs
URN Format
tufts:school name:owner:[collection:]item name
tufts:dca:central:MS102.33.1345

URN Properties
–
–
Provides unique ID to objects deposited into
repository
Service assures resolution to unique resource.
TDL Architecture





Drop Box and Ingestion Service
Naming Service
Fedora Repository Service at Tufts
Indexing Service and Search Engine
Application Creation Service
Fedora Repository Service@Tufts



Fedora - Key Features
Repository at Tufts
Content Models at Tufts
–

Objects, Behaviors and Disseminator
Implementation Challenges
Flexible Extensible Data Object
Repository Architecture (Fedora)





Support for heterogeneous data types
Accommodation of new types as they emerge
Aggregation of mixed, possibly distributed, data into
complex objects
The ability to specify multiple content disseminations
of these objects
The ability to associate rights management schemes
with these disseminations.
Repository Model
Processing
Service
Medium Bandwidth
(20Mb TIFF)
HTTP Request
HTTP
Server
High Bandwidth
(20Mb TIFF)
Storage
Device
Caching Service
HTTP
Request
Medium Bandwidth
Fedora
(200Kb JPEG)
User
Content Model (CM) Hierarchy
Indexing Disseminators
Repository-Level Disseminators
•getIndexTerms
•getArchivalCopy
•getForIndexing
•getPreview
•Etc.
•getClass
•Etc.
Text CM
VUE CM
Image CM
Binary CM
Collection CM
•getTOC
•getConceptMap
•getThumbnail
•getObject
•getObjects
•getChunksList
•getResource
•getAccessHigh
•getMIME
•getInfo
•getChunk
•Etc.
•getImageStats
•Etc.
•Etc.
•Etc.
•Etc.
Specific Implementations
(TEI text, EAD text, Encyclopedia, Directory, TIFF image, etc)
Implementation Challenges






Processing Large XML Documents
Transforming Large Images
Modeling Collections
Advanced Search
Customized Search
Caching Disseminations
TDL Architecture





Drop Box and Ingestion Service
Naming Service
Fedora Repository Service at Tufts
Indexing Service and Search Engine
Application Creation Service
Indexing Service and Search Engine

Indexing
–

Implementation
–

Lucene
Supported Types of Search
–
–

Specialized Polymorphic Disseminators
Basic Keyword
Advanced metadata based
Accessing the service
–
–
HTTP GET/POST
SOAP
TDL Architecture





Drop Box and Ingestion Service
Naming Service
Fedora Repository Service at Tufts
Indexing Service and Search Engine
Application Creation Service
Application Creation Service



An important design requirement for TDL was to allow current
digital library applications to easily interface with TDL and
provide access to the content in the digital library within their
own environments in a seamless fashion.
Current applications like Perseus can interface with this service
to allow their tools to disseminate the content that resides in
TDL
The service has been designed not only to support current
application but also to accommodate the needs of future yet-tobe-defined applications like course management systems,
learning tools, portals etc.
Applications Accessing TDL Content


Tufts DL Search
Visual Understanding Environment (VUE)
Visual Understanding Environment
(VUE)
VUE
Technical Infrastructure
OKI
OKI-FEDORA Bridge
DR
API
DR Implementations
FEDORA
Digital
Repository
Digital
Repository
VUE Architecture
Why TDL?
(Tufts Digital Library)


The collections are continuously expanding
adding content in a variety of formats. The
current architecture of these libraries is not
built to accommodate such expansion.
Need a university wide digital repository that
can manage the ever increasing content
while continuing to service the discipline
specific needs and leveraging existing and
new tools and service
Future Direction



Authentication and authorization service
Customization and enhancement to
Fedora@Tufts to address a wide variety of
needs.
Provide automated browsing service for
Repository.
Download