Digital Preservation Roadmap for Harvard (2014

advertisement
An Update on
Harvard Library’s
Video Preservation Service
Lamont Forum Room
11 September, 2015
David Ackerman, Abigail Bordeaux and Andrea Goethals
Today
1.
2.
3.
4.
5.
6.
7.
8.
Project Context (Andrea)
Media Obsolescence (Dave)
Video Analysis (Andrea)
Video Development (Abigail)
AV Materials Working Group (Dave)
Media Preservation Services (Dave)
Timing (Andrea)
Q&A
Andrea
PROJECT CONTEXT
Media Preservation
Services
Digital Preservation
Services
Library Technology
Services
Organizationally
Within HL
Preservation Services
Within HL
Preservation Services
Within HUIT
Services
Media (audio, video,
film) restoration and
reformatting, DRS
depositing,
consultations,
specifications for and
liaison to external
vendors
DRS oversight and
management,
monitor and maintain
usability of DRS
content, represent
user preservation
needs, consultations,
guidelines
Plan, develop,
maintain library
technology; provide
training and support;
data and reporting
services; project
consultations
Relation to DRS
Depositor, AV format
expert
Business owner
Technology owner
DRS “Support”
• Format allowed in at least one DRS “content
model”
• Repository tools “know” the format
• Usable now (e.g. through delivery services)
• Preservation staff reasonably certain it can be
made usable on an ongoing basis via
interventions
Formats Supported
ICC
JP2
Targts
Web
Harv.
JPEG
XML
PCD
TIFF
Text
GIF
RA
AIFF
WAV
SMIL
2000
2001
2002
2003
2004
2005
2006
ESRI
WFs
2007
GZIP
Opaque
ZIP
PDF
2008
2009
Email
2010
2011
2012
2013
2014
2015
DRS Format Composition
by File Count (~61 Million Files)
TIFF
24%
text
audio
other
JPEG
4%
ZIP
1.4%
Other
0.22%
TEXT
32%
image
XML
6%
GZIP
0.05%
WAVE
0.06%
PDF
0.07%
JP2
33%
RealAudio
0.02%
PCD
0.01%
AIFF
0.0005%
GIF
0.00004%
ICC
0.00004%
DRS Format Composition
by Size (~185 TB per Copy)
GZIP
2%
ZIP
27%
WAVE
6%
TIFF
30%
image
text
audio
other
JPEG
1%
XML
0.08%
PDF
0.22%
TEXT
0.05%
PCD
0.02%
Other
0%
JP2
34%
AIFF
0.06%
RealAudio
0.24%
Born Digital Formats in Harvard Libraries
Moving images / Video
Audio recordings (including podcasts)
Texts / Documents (e.g., Word, PDF, TXT)
Still / 2D Images (e.g., TIFF, JPEG, etc.)
Presentations (e.g., PPT, Keynote)
Websites / Blogs
Spreadsheets (e.g., Excel, Calc, Numbers)
Databases (e.g. Oracle, MySQL, Access databases)
Executable files (e.g., software other than computer…
Email
Datasets (other than GIS data, e.g. SAS, SPSS)
3D Images
Geographic Information Systems (GIS) data (e.g.,…
Drawings / Vector graphics (e.g., CAD/CAM,…
Social media (e.g., Facebook pages, Twitter accounts)
Enterprise systems data (e.g., data exported from an…
Computer games
Other type
Source: HL Preservation
Needs Assessment (2013)
0
Already have
Will have in 3 years
2
4
6
8
10
12
14
16
18
20
Number of Libraries (out of 21 that answered)
DRS
Format
Requests
(2004 -)
Forensic
OCR Text Newspaper
1%
1%
1%
GIS
Articles
1%
Web Sites
1%
3%
Datasets
6%
Ebooks
4%
Databases
6%
Vector Graphics
14%
Other Still Images
7%
Chart last updated:
7/2015 (69 requests)
Video
22%
Software
6%
3D Models
4%
DNG
6%
Office Documents
14%
Format Support Gap
Arts.
ICC
DBs
SW
Video
PDF
Email
SHP
More
Audio
Web
Harv.
Data
sets
XLS
Disk
Imgs
Vector
GZIP
Opaque
Word
PPT
EPUB
Pyth.
NBs
ZIP
PDF
DNG
CAD
3D
News.
Email
2008
2009
2010
2011
2013
2014
2015
JP2
Targts
JPEG
XML
PCD
TIFF
Text
GIF
RA
AIFF
WAV
SMIL
2000
2001
2002
2003
2004
2005
2006
ESRI
WFs
2007
2012
2008: Stop-Gap Solution
• “Opaque objects and containers”
• Any format, BUT...
–
–
–
–
Only bit-level preservation
No delivery
Very coarse description
Less attention by preservation staff
• Moderate uptake - < 20,000 Opaque containers
Adding Format Support – Old Workflow
• All analysis & development done in-house
– by existing staff
– concurrently with other projects / operations
– intermittently (requiring re-familiarization)
• Sometimes stalled by lack of expertise
• Ad-hoc, undocumented process
New Fast-Tracking Workflow
• 3 year project enabled by Arcadia
• Formats:
–
–
–
–
–
–
video
word processing formats
CAD (2D and 3D)
disk images
video image sequences
RAW camera images
• Goal: create a faster format support workflow that can be repeated
• New process working with consultants
Dave
MEDIA OBSOLESCENCE
Legacy Media (Mechanical)
Legacy Media (Magnetic)
Legacy Media (Optical)
Image source:https://www.attingo.com/img/Professionelle_Datenrettung/Datentraeger/cd_dvd_br-analyse.jpg
Legacy Media
Failure Mode
Examples
Media Degradation
“Degradation is well observed by custodians of
media collections although only partly
understood due to the scarcity of scientific data
in this area.”
http://www.indiana.edu/~medpres/documents/iub_media_preservation_survey_FINALwww.pdf p33
Media Obsolescence
“Obsolescence has long been a concern, but has
risen to the forefront in the last five years due to
the accelerating loss of technologies supporting
various formats.”
http://www.indiana.edu/~medpres/documents/iub_media_preservation_survey_FINALwww.pdf p33
Rendering Media
“All audio and video documents are machine readable formats …
equipment has a major influence on the integrity and life
expectancy of the carriers. As a matter of principle, only the
most advanced equipment of the latest generation that provides
the gentlest of handling should be used to replay carriers. This
equipment must fully comply with historical format parameters.”
Schüller, Dietrich. Audio and Video Carriers: Feb 2008.
http://www.tape-online.net/docs/audio_and_video_carriers.pdf
Risk of Loss of Content
• Catastrophic failure of a recording from degradation so that no content is
recoverable
• Partial failure from degradation so that only parts of content are
recoverable
• Diminishment from degradation so that content is recoverable but at
lesser quality
• Inability to optimally reproduce, or reproduce at all, a recording due to
unavailability of playback machines, spare parts, repair expertise, or
playback expertise
• Inability to preserve collections because it has become prohibitively
expensive due to the extreme scarcity of playback machines and technical
playback expertise
http://www.indiana.edu/~medpres/documents/iub_media_preservation_survey_FINALwww.pdf p33
Window of Opportunity
“Studies have concluded that many
analog audio recordings must be
digitized within the next 15 to 20
years—before sound carrier
degradation and the challenges of
acquiring and maintaining playback
equipment make the success of
these efforts too expensive or
unattainable.”
Library of Congress National Recording
Preservation Plan (P 7): December 2012.
http://www.loc.gov/rr/record/nrpb/PLAN%20pd
f.pdf
SAVE Survey Snapshot from Aug. 2014
Harvard Collections
SAVE Survey Snapshot from Aug 2014
10.58%
28.87%
VIDEO
AUDIO
FILM
60.55%
Andrea
VIDEO ANALYSIS – FORMATS,
METADATA & TOOLS
Analysis Workflow
1.
2.
3.
4.
5.
6.
7.
8.
Divide up analysis responsibilities (in-house, consultants)
Determine format analysis criteria
Analyze formats
Create format profiles
Determine preservation strategy
Analyze metadata
Design DRS content model
Analyze tools
Video – Format Criteria
• Generic criteria, prioritized
– Very important (ex: Dependency on a single organization
or company) – 9 criteria
– Somewhat important (ex: standardized) – 9 criteria
– Not very important (ex: descriptive metadata support) – 10
criteria
• Format-specific – 7 criteria, examples:
– Ability to encode in true lossless compression
– Max resolution
Video – Format Matrix (Partial View)
Video Preservation Strategy
• Prefer several formats as archival
– uncompressed, JPEG 2000, MPEG-2 and DV (for DV tape)
– provide a video reformatting service for these
• Accept a few popular proprietary formats but expect to
fast-track migrations for them
– DNxHD, ProRes
• Few wrapper formats (QT, MXF)
• One delivery format (H.264)
Video – Metadata Analysis
• Technical metadata
– EBU Core 1.5 (aligns well with AES-60, structure mirrors
MediaInfo’s output)
• Source metadata
– A revised UTVideoSrc (native suitability to physical media,
right amount of detail)
• Process history
– A revised reVTMD (specific, simple, sufficient)
• Future work: descriptive metadata
VIDEO OBJECT =
1 Object Descriptor
1..n Video Files
Video – DRS
Content Model
HAS_SOURCE
0..n Video Files
HAS_SOURCE
0..n Video Files
1 metadata file and
1 or more derivative
video files
HAS_DOCUMENTATION
VIDEO EDIT DECISION LIST OBJECT
HAS_LARGER_CONTEXT
DISK IMAGE OBJECT
VIDEO OBJECT
HAS_SUPPLEMENT
CLOSED CAPTION DATA OBJECT
SUBTITLE DATA OBJECT
POSTER FRAME OBJECT
DOUBLE SYSTEM AUDIO OBJECT
Ex. – Video – Tool Analysis
• Incorporate MediaInfo into FITS (fitstool.info)
• Make FITS track-aware
Abigail
VIDEO DEVELOPMENT - ROADMAP
Development Timeline
Video developer
joins LTS
Jan 2015
Q3 FY16
Video release 1
July 2015
Jan 2016
Development for Release 1: single video file plus optional derivatives
 Deposit via Batch Builder
 Manage via Web Admin
 Deliver through Streaming Delivery Service
Development Timeline
Q3 FY16
Video release 1
Jan 2016
Development for Release 2:
 Multiple files and playlists
 Enhancements to caption
support
 Delivery enhancements
Q1 FY17
Video release 2
July 2016
Jan 2017
Additional development (pending prioritization):
 Audio description support
 Multiple audio track support
 Poster Frame deposits
 And more
Dave
AV MATERIALS WORKING GROUP
AVWG Charge
The Stewardship Standing Committee charges the Audiovisual Materials
Working Group (AVWG) to gather data and make recommendations regarding
the priorities for digitizing audio and video content at Harvard and to make
recommendations for the tools, policies, and resources needed for AV
digitization, delivery, and preservation.
Out of scope: motion picture film (strategies already in place), photographs
and other visual materials that are not considered time-based media, and
commercially available audio and video content.
Summary of Activities
• Provide feedback to Library Technology Services (LTS)
to aid in the development of basic and advanced
delivery services for video content.
• Make recommendations on how to develop priorities
for digitization of audio and video content.
• Develop recommendations for descriptive metadata
for video content.
Informal Survey for Video Deposit
• Data gathered from 5 Harvard repositories
represented in the AVWG
• Five year forecast
How many hours of digital video
do you currently have?
350
300
250
200
150
100
50
0
Rep. 1
Rep. 2
Rep. 3
Rep. 4
Rep. 5
How many hours of analog video
do you plan to digitize and deposit to the DRS
over the next 5 years?
2000
1500
1000
500
0
Rep. 1
Rep. 2
Year 1
Year 2
Rep. 3
Year 3
Rep. 4
Year 4
Year 5
Rep. 5
Dave
MEDIA
PRESERVATION
SERVICES
Expanding Services
• Video Deposit (DRS)
• Video Technical Metadata
• Video Digitization
• 1” Open Reel Type C
• U-Matic
• Digi Beta
• Beta SP
• Betamax
• SVHS/VHS
• DVCAM/DVC-PRO/DV
• Video-8
Expanding Services
Formats
• Mathematically lossless JPEG2000
• Quicktime Uncompressed
• Apple ProRes
• AVID DNxHD
• H.264
Metadata Standards
• EBUCore
• UTVideoSrc
• ReVTMD
Andrea
TIMING
Timing (now – end of 2015)
• Develop guidelines
– Early October 2015: recommendations for setting AV
digitization priorities (AV Materials WG)
– Rest of 2015: Discussion (Steering Committees and
Library Leadership Team)
• Analyze cost model for AV material
– Analysis and recommendations with goal to provide
direction for FY17 budgets (Library Finance, LTS,
Preservation Services)
Timing (Rest of FY16)
• MPS reformatting and DRS piloting
– 3 collections with funding deadlines including Hidden
Collections with A/V materials
– Monitor impact (network transfers, storage, delivery
performance, etc.)
– Gain experience (efficient workflows, staff expertise)
– Draft guidelines (estimating size and cost,
specifications for vendors)
Timing (FY17
-)
• Open up video reformatting and DRS deposit
service
• Refine cost model for video
• Guidelines for prioritizing AV digitization
Franziska
Q&A
Download