Federal Digitization Moving to Common Guidelines The U.S. Federal Agencies Digitization Guidelines Initiative (FADGI) http://www.digitizationguidelines.gov/ PASIG, May 24, 2013 Carl Fleischhauer cfle@loc.gov Steve Puglia spug@loc.gov Library of Congress Washington, DC http://www.digitizationguidelines.gov/ 2 18 Participating Agencies http://www.digitizationguidelines.gov/participants/ Often participating, not “official”: NASA, NOAA, National Museum of Health and Medicine (U.S. Army), U.S. Supreme Court 32 http://www.digitizationguidelines.gov/stillimages/ http://www.digitizationguidelines.gov/audio-visual/ Guidelines • Conceptual framework documents – Content Categories & Digitization Objectives (still image reproduction; September 3, 2009) – Digitization Activities – Project Planning (November 4, 2009) • Capture device performance – Digital Imaging Framework (high level about scanner performance metrics; April 2, 2009) – Audio Analog-to-Digital Converter Performance (August 20, 2012) – Audio Interstitial Errors (about unwanted dropouts or sample distortion; work in progress, 2012-13) • Broad practices guidelines – Technical Guidelines for the Still Image Digitization of Cultural Heritage Materials (Many segments from 2004 NARA document; FADGI update, August 24, 2010) 6 Guidelines • Metadata including embedded data and file headers – TIFF Image Header Metadata (February 10, 2009) – Minimal Descriptive Embedded Metadata in Digital Still Images (Smithsonian document embraced by group; March 23, 2012) – Embedding Metadata in Broadcast WAVE Files, Version 2 (April 23, 2012) • Associated tool on SourceForge: BWF MetaEdit – NARA reVTMD video technical metadata (February 2012; FADGI supporting role) • Associated tool on GitHub: AVI MetaEdit • Format analysis and guidelines – File Format Comparisons (comparing still image and video formats; under development in 2013) – MXF Preservation Video Formatting Application Specification (under development during 2013 in cooperation with AMWA trade group; versions posted in 2010 and 2012) 7 Still Image Illustrative Example Odds and ends about still images Still image specifications – this is what we all “used to do” • color/monochromatic • pixel density (good old “dpi”) • bit depth • . . . usually output-referred We want to move toward more, um, “scientific” specifications Tone Gamma Resolution Color Spatial Frequency Response (SFR) Luminance Delta E2000 Resolution Delta E(a*b*)2000 Sampling Efficiency Channel Mis-registration Sampling Frequency White Balance Uniformity % Lighting Non-uniformity Noise Total rms deviation From this document: http://www.digitizationguidelines.gov/guidelines/DIFfinal.pdf Resolution rethink: new terms, scanner performance • SAMPLING RATE • SPATIAL RESOLUTION – Spatial Frequency Response (SFR) • SAMPLING EFFICIENCY Thanks to Barry Wheeler for his very helpful Signal blogs: http://blogs.loc.gov/digitalpreservation/2012/12/what-resolution-should-i-use-part-1/ http://blogs.loc.gov/digitalpreservation/2013/01/what-resolution-should-i-use-part-2/ http://blogs.loc.gov/digitalpreservation/2013/03/what-resolution-should-i-use-part-3/ Resolution rethink: new terms, scanner performance • SAMPLING RATE. Usually, the scanner’s ppi number is sampling rate – Sensors can only attempt to measure (sample) the brightness at each point. – Some light may scatter and miss the sensor, the scanner’s motor step may not be sufficiently precise, or the collected value may be inaccurate. Inside every scanner or camera, between the sensor and the screen is a small, highly specialized computer called a digital signal processor. This processor must work very hard to link a dot on the page to a dot on the screen. • RESOLUTION. ISO standards (e.g., ISO 12233) define resolution in terms of Spatial Frequency Response (SFR) -- the actual result on the screen. • SAMPLING EFFICIENCY. . . . the difference between the pixel count and actually resolving each point, expressed as percentage. From the revised guideline http://www.digitizationguidelines.gov/guidelines/FADGI_Still_Image-Tech_Guidelines_2010-08-24.pdf Tools to Support Image Performance Measurement • Digital Image Conformance Evaluation (DICE) System – Device Target – Imaging Device Performance – Object Target – Actual Image Quality – Software for Evaluation/Validation • Based in LabVIEW • Data export for use in SQC/SPC Device and Object Targets Object target as positioned for use DICE Software – Main Panel DICE – QC Summary Panel Slide from old version of software DICE – OECF detail page DICE – SFR detail page Audio-Visual Illustrative Example MXF format specification for reformatted video Library of Congress Packard Campus, Culpeper National Archives, College Park Smithsonian Institution Archives SAMMA from Front Porch Digital Implementations • SAMMA at LC: Lossless compressed – Each frame is a JPEG 2000 image – Lossless (reversible) transform • Emergent variants – NARA and other archives prefer uncompressed video – Other devices come on the market, e.g., from OpenCube (Belgium), Amberfin (UK), Cube-Tec (Germany), and others in process (e.g., Archimedia) Standards-based format elements from SMPTE and ISO/IEC • MXF (SMPTE ST 377 and many more) • Standard definition uncompressed covered in ST 377 and also SMPTE ST 384 • JPEG 2000 encoding (ISO/IEC 15444-1) • JPEG 2000 mapped to MXF (SMPTE ST 422) • Other standards also play a role, most from SMPTE, some from EBU Loose Ends • MXF, JPEG 2000, and even “uncompressed” video are complex standards • Entities that “conform” to the standards can be formatted in various ways – We have some elements that we want to include in order to produce an “authentic copy” – MXF “carriage” can be tricky to sort out MXF Application Specification • An MXF AS is what some would call a profile • Pin down preferred options, reduce the variables • Support greater interoperability • Increase the comfort level for users • Increase vendor competition • More adoption means better sustainability Timecode • Source recordings may have multiple timecodes (VITC, LTC, etc.), some on purpose, some by accident, all may provide forensic help for future researchers. • Specify preferred practice for retaining and tagging multiple timecodes in the file Audio tracks • Source may have multiple tracks • MXF audio track specifications cover “listing” or “allocation” (tagging) and other matters of terminology, need to pin these down Metadata • Basic tech metadata is not an issue • Needed: specified options for embedding additional technical metadata: – – – – process (like METS digiprov), about the source item about quality review outcomes preservation (like PREMIS), • And some descriptive metadata – Schools of thought: some prefer minimal data (“just and identifier”), others would dump everything they have, specification should permit range of actions – “archivists choice” Closed captioning, subtitles, ancillary data • US broadcast standards embed CC as binary data – “In the image raster” on line 21 – For digital TV, CC also in packets in MPEG stream – Awkward for future extraction, depends upon availability of decoding tools • Desiderata – Put CC/subtitles in the file for easier access and extraction – XML rather than binary – Alas, MXF offers “too many” options for this, we seek to pin down the best ones • By extension, this also applies to other ancillary data. An MXF Application Specification is . . . • A formal industry statement – Not a “standard” • Accompanied by a reference implementation and validation tools MXF Application Specifications come from . . . • Advanced Media Workflow Association (AMWA) – Broadcast-industry group – AMWA Application Specifications include: • AS-10 for production – version for end-to-end digital production workflow (forthcoming) • AS-11 for contribution – the high end version contributed by a producer to a television network (published) • AS-03 for delivery – the reduced-data version “sent to the tower for broadcast” (published) – AS-07 for archiving and preservation will be a sibling to those – http://www.amwa.tv/projects/AMWA_AS_overview 04-2013 web.pdf Role of AMWA • Key roles played by Turner Broadcasting veterans and engineering staff • Members include AVID, BBC, Front Porch Digital (SAMMA), NARA, PBS, SONY, Discovery Communications, Fox, NBC Universal, and more • http://www.amwa.tv/ • Break into technical committees to push draft specifications FADGI’s AMWA status • March 2012 – AMWA business committee approval to move ahead – Designate as AS-07 • September 2012 – Technical committee approval • November 2012 – Team meetings began • Early 2013 – Churning along • End of 2013 – Dream of a first draft or better http://www.digitizationguidelines.gov/ Carl Fleischhauer cfle@loc.gov