An Update on Harvard Library’s Video Preservation Service Lamont Forum Room 11 September, 2015 David Ackerman, Abigail Bordeaux and Andrea Goethals Today 1. 2. 3. 4. 5. 6. 7. 8. Project Context (Andrea) Media Obsolescence (Dave) Video Analysis (Andrea) Video Development (Abigail) AV Materials Working Group (Dave) Media Preservation Services (Dave) Timing (Andrea) Q&A Andrea PROJECT CONTEXT Media Preservation Services Digital Preservation Services Library Technology Services Organizationally Within HL Preservation Services Within HL Preservation Services Within HUIT Services Media (audio, video, film) restoration and reformatting, DRS depositing, consultations, specifications for and liaison to external vendors DRS oversight and management, monitor and maintain usability of DRS content, represent user preservation needs, consultations, guidelines Plan, develop, maintain library technology; provide training and support; data and reporting services; project consultations Relation to DRS Depositor, AV format expert Business owner Technology owner DRS “Support” • Format allowed in at least one DRS “content model” • Repository tools “know” the format • Usable now (e.g. through delivery services) • Preservation staff reasonably certain it can be made usable on an ongoing basis via interventions Formats Supported ICC JP2 Targts Web Harv. JPEG XML PCD TIFF Text GIF RA AIFF WAV SMIL 2000 2001 2002 2003 2004 2005 2006 ESRI WFs 2007 GZIP Opaque ZIP PDF 2008 2009 Email 2010 2011 2012 2013 2014 2015 DRS Format Composition by File Count (~61 Million Files) TIFF 24% text audio other JPEG 4% ZIP 1.4% Other 0.22% TEXT 32% image XML 6% GZIP 0.05% WAVE 0.06% PDF 0.07% JP2 33% RealAudio 0.02% PCD 0.01% AIFF 0.0005% GIF 0.00004% ICC 0.00004% DRS Format Composition by Size (~185 TB per Copy) GZIP 2% ZIP 27% WAVE 6% TIFF 30% image text audio other JPEG 1% XML 0.08% PDF 0.22% TEXT 0.05% PCD 0.02% Other 0% JP2 34% AIFF 0.06% RealAudio 0.24% Born Digital Formats in Harvard Libraries Moving images / Video Audio recordings (including podcasts) Texts / Documents (e.g., Word, PDF, TXT) Still / 2D Images (e.g., TIFF, JPEG, etc.) Presentations (e.g., PPT, Keynote) Websites / Blogs Spreadsheets (e.g., Excel, Calc, Numbers) Databases (e.g. Oracle, MySQL, Access databases) Executable files (e.g., software other than computer… Email Datasets (other than GIS data, e.g. SAS, SPSS) 3D Images Geographic Information Systems (GIS) data (e.g.,… Drawings / Vector graphics (e.g., CAD/CAM,… Social media (e.g., Facebook pages, Twitter accounts) Enterprise systems data (e.g., data exported from an… Computer games Other type Source: HL Preservation Needs Assessment (2013) 0 Already have Will have in 3 years 2 4 6 8 10 12 14 16 18 20 Number of Libraries (out of 21 that answered) DRS Format Requests (2004 -) Forensic OCR Text Newspaper 1% 1% 1% GIS Articles 1% Web Sites 1% 3% Datasets 6% Ebooks 4% Databases 6% Vector Graphics 14% Other Still Images 7% Chart last updated: 7/2015 (69 requests) Video 22% Software 6% 3D Models 4% DNG 6% Office Documents 14% Format Support Gap Arts. ICC DBs SW Video PDF Email SHP More Audio Web Harv. Data sets XLS Disk Imgs Vector GZIP Opaque Word PPT EPUB Pyth. NBs ZIP PDF DNG CAD 3D News. Email 2008 2009 2010 2011 2013 2014 2015 JP2 Targts JPEG XML PCD TIFF Text GIF RA AIFF WAV SMIL 2000 2001 2002 2003 2004 2005 2006 ESRI WFs 2007 2012 2008: Stop-Gap Solution • “Opaque objects and containers” • Any format, BUT... – – – – Only bit-level preservation No delivery Very coarse description Less attention by preservation staff • Moderate uptake - < 20,000 Opaque containers Adding Format Support – Old Workflow • All analysis & development done in-house – by existing staff – concurrently with other projects / operations – intermittently (requiring re-familiarization) • Sometimes stalled by lack of expertise • Ad-hoc, undocumented process New Fast-Tracking Workflow • 3 year project enabled by Arcadia • Formats: – – – – – – video word processing formats CAD (2D and 3D) disk images video image sequences RAW camera images • Goal: create a faster format support workflow that can be repeated • New process working with consultants Dave MEDIA OBSOLESCENCE Legacy Media (Mechanical) Legacy Media (Magnetic) Legacy Media (Optical) Image source:https://www.attingo.com/img/Professionelle_Datenrettung/Datentraeger/cd_dvd_br-analyse.jpg Legacy Media Failure Mode Examples Media Degradation “Degradation is well observed by custodians of media collections although only partly understood due to the scarcity of scientific data in this area.” http://www.indiana.edu/~medpres/documents/iub_media_preservation_survey_FINALwww.pdf p33 Media Obsolescence “Obsolescence has long been a concern, but has risen to the forefront in the last five years due to the accelerating loss of technologies supporting various formats.” http://www.indiana.edu/~medpres/documents/iub_media_preservation_survey_FINALwww.pdf p33 Rendering Media “All audio and video documents are machine readable formats … equipment has a major influence on the integrity and life expectancy of the carriers. As a matter of principle, only the most advanced equipment of the latest generation that provides the gentlest of handling should be used to replay carriers. This equipment must fully comply with historical format parameters.” Schüller, Dietrich. Audio and Video Carriers: Feb 2008. http://www.tape-online.net/docs/audio_and_video_carriers.pdf Risk of Loss of Content • Catastrophic failure of a recording from degradation so that no content is recoverable • Partial failure from degradation so that only parts of content are recoverable • Diminishment from degradation so that content is recoverable but at lesser quality • Inability to optimally reproduce, or reproduce at all, a recording due to unavailability of playback machines, spare parts, repair expertise, or playback expertise • Inability to preserve collections because it has become prohibitively expensive due to the extreme scarcity of playback machines and technical playback expertise http://www.indiana.edu/~medpres/documents/iub_media_preservation_survey_FINALwww.pdf p33 Window of Opportunity “Studies have concluded that many analog audio recordings must be digitized within the next 15 to 20 years—before sound carrier degradation and the challenges of acquiring and maintaining playback equipment make the success of these efforts too expensive or unattainable.” Library of Congress National Recording Preservation Plan (P 7): December 2012. http://www.loc.gov/rr/record/nrpb/PLAN%20pd f.pdf SAVE Survey Snapshot from Aug. 2014 Harvard Collections SAVE Survey Snapshot from Aug 2014 10.58% 28.87% VIDEO AUDIO FILM 60.55% Andrea VIDEO ANALYSIS – FORMATS, METADATA & TOOLS Analysis Workflow 1. 2. 3. 4. 5. 6. 7. 8. Divide up analysis responsibilities (in-house, consultants) Determine format analysis criteria Analyze formats Create format profiles Determine preservation strategy Analyze metadata Design DRS content model Analyze tools Video – Format Criteria • Generic criteria, prioritized – Very important (ex: Dependency on a single organization or company) – 9 criteria – Somewhat important (ex: standardized) – 9 criteria – Not very important (ex: descriptive metadata support) – 10 criteria • Format-specific – 7 criteria, examples: – Ability to encode in true lossless compression – Max resolution Video – Format Matrix (Partial View) Video Preservation Strategy • Prefer several formats as archival – uncompressed, JPEG 2000, MPEG-2 and DV (for DV tape) – provide a video reformatting service for these • Accept a few popular proprietary formats but expect to fast-track migrations for them – DNxHD, ProRes • Few wrapper formats (QT, MXF) • One delivery format (H.264) Video – Metadata Analysis • Technical metadata – EBU Core 1.5 (aligns well with AES-60, structure mirrors MediaInfo’s output) • Source metadata – A revised UTVideoSrc (native suitability to physical media, right amount of detail) • Process history – A revised reVTMD (specific, simple, sufficient) • Future work: descriptive metadata VIDEO OBJECT = 1 Object Descriptor 1..n Video Files Video – DRS Content Model HAS_SOURCE 0..n Video Files HAS_SOURCE 0..n Video Files 1 metadata file and 1 or more derivative video files HAS_DOCUMENTATION VIDEO EDIT DECISION LIST OBJECT HAS_LARGER_CONTEXT DISK IMAGE OBJECT VIDEO OBJECT HAS_SUPPLEMENT CLOSED CAPTION DATA OBJECT SUBTITLE DATA OBJECT POSTER FRAME OBJECT DOUBLE SYSTEM AUDIO OBJECT Ex. – Video – Tool Analysis • Incorporate MediaInfo into FITS (fitstool.info) • Make FITS track-aware Abigail VIDEO DEVELOPMENT - ROADMAP Development Timeline Video developer joins LTS Jan 2015 Q3 FY16 Video release 1 July 2015 Jan 2016 Development for Release 1: single video file plus optional derivatives Deposit via Batch Builder Manage via Web Admin Deliver through Streaming Delivery Service Development Timeline Q3 FY16 Video release 1 Jan 2016 Development for Release 2: Multiple files and playlists Enhancements to caption support Delivery enhancements Q1 FY17 Video release 2 July 2016 Jan 2017 Additional development (pending prioritization): Audio description support Multiple audio track support Poster Frame deposits And more Dave AV MATERIALS WORKING GROUP AVWG Charge The Stewardship Standing Committee charges the Audiovisual Materials Working Group (AVWG) to gather data and make recommendations regarding the priorities for digitizing audio and video content at Harvard and to make recommendations for the tools, policies, and resources needed for AV digitization, delivery, and preservation. Out of scope: motion picture film (strategies already in place), photographs and other visual materials that are not considered time-based media, and commercially available audio and video content. Summary of Activities • Provide feedback to Library Technology Services (LTS) to aid in the development of basic and advanced delivery services for video content. • Make recommendations on how to develop priorities for digitization of audio and video content. • Develop recommendations for descriptive metadata for video content. Informal Survey for Video Deposit • Data gathered from 5 Harvard repositories represented in the AVWG • Five year forecast How many hours of digital video do you currently have? 350 300 250 200 150 100 50 0 Rep. 1 Rep. 2 Rep. 3 Rep. 4 Rep. 5 How many hours of analog video do you plan to digitize and deposit to the DRS over the next 5 years? 2000 1500 1000 500 0 Rep. 1 Rep. 2 Year 1 Year 2 Rep. 3 Year 3 Rep. 4 Year 4 Year 5 Rep. 5 Dave MEDIA PRESERVATION SERVICES Expanding Services • Video Deposit (DRS) • Video Technical Metadata • Video Digitization • 1” Open Reel Type C • U-Matic • Digi Beta • Beta SP • Betamax • SVHS/VHS • DVCAM/DVC-PRO/DV • Video-8 Expanding Services Formats • Mathematically lossless JPEG2000 • Quicktime Uncompressed • Apple ProRes • AVID DNxHD • H.264 Metadata Standards • EBUCore • UTVideoSrc • ReVTMD Andrea TIMING Timing (now – end of 2015) • Develop guidelines – Early October 2015: recommendations for setting AV digitization priorities (AV Materials WG) – Rest of 2015: Discussion (Steering Committees and Library Leadership Team) • Analyze cost model for AV material – Analysis and recommendations with goal to provide direction for FY17 budgets (Library Finance, LTS, Preservation Services) Timing (Rest of FY16) • MPS reformatting and DRS piloting – 3 collections with funding deadlines including Hidden Collections with A/V materials – Monitor impact (network transfers, storage, delivery performance, etc.) – Gain experience (efficient workflows, staff expertise) – Draft guidelines (estimating size and cost, specifications for vendors) Timing (FY17 -) • Open up video reformatting and DRS deposit service • Refine cost model for video • Guidelines for prioritizing AV digitization Franziska Q&A