Digital Repositories for Preservation and Access

advertisement
HATHI TRUST
A Shared Digital Repository
Digital Repositories for
Preservation and Access
Digital Directions 2013
Jeremy York
July 22, 2013
Unless otherwise noted, these slides and their contents are licensed under a Creative Commons
Attribution Unported License.
Digital repositories
• Primary mission to preserve content
• Performs actions to this end
Reasons to preserve content
• For access
• Guard against threats to content
– Digitization accepted method of preservation
reformatting
– Digital deteriorates, is fragile
Reasons to provide access
• Meet needs of designated community
• Check on integrity of content
• Content that is accessible is more likely to be
valued and preserved in the future
Reasons access might not be offered
•
•
•
•
Copyright
Privacy
Licensing
Needs of user community
– Content available elsewhere
• Technical limitations
– Networking and storage requirements
A number of models
• Full user access to preserved digital objects
• No end-user access to digital objects
• Delayed or triggered user access to digital
objects
• Partial access to digital objects
Requirements to preserve content
• OAIS
– “An OAIS is an Archive, consisting of an
organization...of people and systems that has
accepted the responsibility to preserve
information and make it available for a Designated
Community.” [does not imply unrestricted access]
OAIS
• Support information model
– Define target of preservation (content data and representation
information)
– Define metadata needed to preserve, identify, contextualize
information (PDI)
• Fulfill responsibilities
–
–
–
–
–
Accept information from Producers
Obtain control sufficient to preserve
Ensure understandable to designated community
Ensure preservation
Make available to designated community with information
supporting authenticity
Ensure preservation
• Some strategies:
– Transformation
– Validation
– Checks on integrity
– Replication
– Choice of formats
– Migration
TRAC
• Starts with “a mission to provide reliable,
long-term access to managed digital resources
to its designated community, now and into the
future”
• Encompasses
– Organizational Infrastructure
– Digital Object Management
– Technical Infrastructure
TRAC (2)
• Borrows vocabulary from OAIS
• Adapts ideas for applying criteria from nestor
and Digital Curation Centre
– Documentation (evidence)
– Transparency
– Adequacy
– Measurability
Mission
OAIS
TRAC
Provenance
Reference
Context
Fixity
Access Rights
Content Data
Representation
Information
Preservation
Actions
Integrity
Authenticity
Transparency
Documentation
Organizational
Infrastructure
Reliability
Adequacy
Digital Object
Management
Designated Community
Preserve Content
Measurability
Technical
Infrastructure
Where does access come in
• Some level of access is necessary
– Management, integrity
• What is preserved may not be what is most
useful to the end user
• Implications across the repository
Content formats
• Can the content you are preserving be delivered over the
Web?
– Will you be storing derivative files?
– Is some kind of transformation needed?
– Do the files offer consistent functionality?
• Implications for scale of repository, access systems, changes
to services
• In HathiTrust:
– Limited to 3 formats, largely uniform in technical characteristics
• ITU G4 TIFF
• JPEG2000
• Unicode (with and without coordinates)
Storage of information about content
• Is information about object adequately
available for both preservation and access?
– Structural information
– Preservation information with implications for
interface
• HathiTrust uses METS as a wrapper
– Available for preservation and access
Content Package
images
text
Source
METS
Zip
HT
METS
Architecture
../uc1/pairtree_root/b3/54/34/86/b34543486
b34543486.zip
b34543486.mets.xml
images
HT
METS
text
Source
METS
Storage
• Does the storage system support needs for
ingest and access?
• In HathiTrust:
– Need to have fast access to repository systems to
support services
Security
• Data Integrity
– Checksum validation, digital object provenance
• Physical security
– Biometric door systems, locked racks
• Network security
– Firewalling, vulnerability scanning
• Application security
– Developer best practices, input validation
• Access control…
Differential access to content
• Rights database
– Ensures appropriate access
• Holdings database
– Facilitates lawful uses of materials
Authentication/Authorization
• Mechanisms to enable differential access,
ensure security and appropriate use
User services
• Bibliographic and full-text search indexes
• Collection-building capabilities
• User interfaces
APIs and Datasets
•
•
•
•
•
Data API
Bibliographic API
OAI
“Hathifiles”
Datasets
More
• Quality
• User Support
• Correction
Content Formats
Content Package
Architecture
Storage
Security
Authentication
Authorization
Differential Access
Copyright/Agreem
ents
Lawful Uses
Indexes
Services / User
Interfaces
APIs and Datasets
Information Quality
User Support
Correction
Provide Access
Mission
Preservation
OAIS
TRAC
Provenance
Reference
Context
Fixity
Access Rights
Content Data
Representation
Information
Preservation
Actions
Integrity
Authenticity
Documentation
Organizational
Infrastructure
Transparency
Reliability
Adequacy
Digital Object
Management
Measurability
Technical
Infrastructure
Designated Community
Content
Formats
Content
Package
Architecture
Security
Authentication
Authorization
Lawful Uses
Indexes
Copyright/Agre
ements
APIs and
Datasets
Information
Quality
User Support
Storage
Differential
Access
Services / User
Interfaces
Correction
Access
Thank you!
How to find out more
•
•
•
•
About: http://www.hathitrust.org/about
Twitter: http://twitter.com/hathitrust
Facebook: http://www.facebook.com/hathitrust
Monthly newsletter:
– http:www.hathitrust.org/updates
– RSS http://www.hathitrust.org/updates_rss
• Contact us: feedback@issues.hathitrust.org
• Blogs: http://www.hathitrust.org/blogs
– Large-scale Search
– Perspectives from HathiTrust
Download