AUDIO STANDARDS National Gallery of the Spoken Word (NGSW) projects, based at Michigan http://www.historicalvoices.org/papers/audio_digitization.pdf Digitizing Speech Recordings for Archival Purposes Sample rate Bit depth putative influence of supersonic harmonics on brain function as one of the primary reasons for choosing the sample rate of 192 kHz 96,000 Hz; Choosing the standards One important conclusion that we can already make at this point is that our digitization standards should be able to faithfully represent acoustic signals that are varied in dynamics and have a frequency response of, minimally, 020,000 Hz, which happens to coincide with typical human hearing range. Such minimal requirements can be accomplished, in theory, by using standard, “CD quality” settings of a sample rate of 44,100 Hz State University bit-depth: 24-bit 16-bit However, it is also true that higher specifications, especially 24-bit word length, help capture more detail and minimize digitization noise and distortion. Digitization Digitization is a process of converting an analog, continuous, waveform to digital form by an Analog-toDigital converter (ADC). even though the speech signal usually does not contain any information above 7 kHz, and, theoretically, the sample rate of 16 KHz should be Hz & KHz Acoustic signals that humans can hear lie in a limited range of about 20 to 20,000 Hz. Thus, intuitively, in order to reconstruct exactly the original analog signal, one should use the sample rate of at least 40,000 Hz. This is, indeed, true. It is often recommended to use the use the practice of CD quality that has a sample rate of 44.1 KHz for all of spoken word digitization projects. For high-fidelity applications, such as archival copies of analog recordings, 16 bits per sample (65, 536 levels), or a sufficient to capture all details of the signal, it is nevertheless recommended to use the sample rate of 44.1 kHz for all of AD conversion for archival purposes so-called 16-bit resolution, should be used. Sufficient to use the 44,100 Hz sampling rate and a 16-bit resolution. maximum 96,000 Hz sampling rate and a 24-bit quantization. Historical Vocies Standard Choice In light of the above arguments, Historical Voices has chosen the following digitization best practices for spoken word resources: • Sample rate: 96,000 Hz • Bit-depth: 24-bit • Oversampling delta-sigma A/D converter hardware Such hardware should include the following features: • 24-bit quantization (with a 16-bit option) • 96,000 Hz maximum sample rate • oversampling capability • user-selected anti-aliasing filters • a wide assortment of sample rates • XLR (balanced) and RCA (unbalanced) inputs • high gain preamplifier to accommodate low-level input signal levels • digital (minimally, SPDIF and AES/EBU) inputs and outputs • user-selected AC/DC coupling for all channels • real-time indicator of signal overloading recommended that a digitizing system use a professional-level sound card that meets the following specifications: • PCI Interface • 8 to 24 bit resolution • variable sample rates, including 11.025kHz , 44.1kHz, and 96 kHz • S/PDIF digital Minnesota Digital Library - http://www.mndigital.org/digitizing/standards/ - from Digital Audio Best Practices - “Digital Audio Working Group. Digital Audio Best Practices.” http://www.bcr.org/dps/cdp/best/digital-audio-bp.pdf Institution Minnesota Material type Digitized audio Sample rate 44.1 kHz Recommended Bit depth (color depth) 24 bit Recommended Pros More accurately reproduces sound of source material. Increased capability to enhance source file for delivery. Increased dynamic range. Acceptable for publication and broadcast. Digitized audio 96 kHz Optimal 24 bit Optimal Reflects current professional audio standards. Standard for DVD/HD Audio. Increased frequency range. More accurately reproduces sound of high frequency, high quality source material, such as musical recordings. Increased potential for enhancement of source file for delivery. More potential for future applications. Potential Recommended benchmark for future. Highest recommended current quality. Cons Rapidly growing acceptance. Reflects emerging professonal audio standards. “Sound Directions” Indiana University http://www.dlib.indiana.edu/projects/sounddirections/papersPresent/sd_bp_07.pdf Best Practice 3: Route the signal from the playback machine to the analog-to-digital converter using the cleanest, most direct signal path possible. Best Practice 4: Design the monitoring chain to allow instant comparison of the signal from the playback machine to the signal that has passed through the analog-to-digital converter. Best Practice 5: Preservation studios must include test/calibration equipment to test and monitor the transfer chain itself for noise as well as to test individual components for performance. During transfer, the test/calibration equipment shall not be inserted between the playback machine and the recorder. Best Practice 6: Use the Broadcast Wave Format for the preservation of audio. Best Practice 7: Use the <bext> chunk of the Broadcast Wave Format for metadata that is needed to interpret the contents of a file in the absence of accompanying metadata. Do not use it as the authoritative source or for metadata that may change over time. Best Practice 8: Include the local name for the file in the Description field. Best Practice 9: Use the Time Reference field to provide a time stamp showing the location of the file on a reference or destination timeline. obsolete The audio CD was established with a sampling rate of 44.1 kHz at a bit depth of 16. This combination is now almost universally considered inadequate for audio preservation of analog recordings. standard There is currently wide agreement on bit depth for preservation transfer of analog sources with 24 bits recommended. confusion There is less agreement on sampling rate and this topic remains somewhat controversial recommend higher sampling rates than 44.1 kHz for the following reasons:31 It is important to accurately capture noise, such as clicks and pops on a disc, and other inaudible, high frequency information so that improved signal processing algorithms in the future that are able to take advantage of higher frequency information will have enough data to work as effectively as possible. Some of this noise resides in frequency ranges higher than can be captured at 44.1 kHz Higher sampling frequencies enable manufacturers to build better antialiasing low- pass filters that operate more efficiently, thereby improving performance within the range of human hearing32 Many musical instruments are capable of producing information in higher frequency ranges—including inaudible higher frequency harmonic content that also impacts our perception of sounds Higher sampling frequencies provide improved temporal response, or the timing of the arrival of sounds, that in turn improves spatial imaging (the locations of sounds from within a stereo or surround sound-field) The limit of human hearing acuity is not yet known, therefore the point of transparency of a recording system cannot be known33 IASA-TC 04 recommends encoding to linear pulse-code modulation (PCM) with a minimum sample rate of 48 kHz, and for many purposes suggests transferring at 24 bit with a 96 kHz sampling rate. In fact, 24/96 has become the standard choice for audio preservation reformatting. For the reasons listed above, and for the format’s wide support and sustainability, both Sound Directions institutions have selected 24 bit, 96 kHz linear PCM encoding. For the evaluation of other potential encoding schemes such as 1-bit sigmadelta, the cautious preservationist is well served by the work of Caroline R. Arms and Carl Fleischhauer at the Library of Congress on the sustainability of digital formats.34 This document explores a number of sustainability factors for any digital format including disclosure, adoption, transparency, selfdocumentation, and impact of patents. Archival Master is the BWF file Below are two tables with the specifics of the transfer chain—first, for tape and second, for disc transfers. Device Type Device Channel/Connector Comments Playback machines Studer A810 or Tascam 122 Analog/XLR MKII Analog-to-digital converter Benchmark ADC1 Digital AES/EBU output/XLR Sound card Computer Slow speed playback machine and line amp Device Type Playback machine Lynx AES16 Digital AES/EBU PCI card input/DB 25 connector Dell Optiplex GX620 Pentium 4 processor, 3.8 GHz, 2.0 GB RAM Revox B77 and Gaines Analog RCA outputs For tapes recorded at Balanced Line from Revox 1.875 and 0.938 ips. Interface The Gaines device is inserted between the Revox and converter Device Channel/Connector Comments Technics SP-15 turntable Analog RCA outputs from the cartridge Includes SME 3012 tonearm, Stanton 500 cartridge, various styli Preamp 1 KAB Souvenir EQS MK12 Analog RCA in/Analog balanced using flat setting TRS out Preamp 2 Owl 1 using a Analog Used only when necessary for playback playback eq curve unbalanced RCA curve. Both preamps are used together in and out to generate flat and equalized files at the same time Same converter, card, and computer as above www.lib.virginia.edu/digital/info/loftreport.doc “Positioning for Our Future: Report of the Library of Tomorrow Planning Teams” University of Virginia Library April 10, 2001 Audio Purpose/Goal Archival quality Service quality Recommended Attributes/ Formats Attributes: 44.1 kHz, 16 bits per sample File format(s): AIFF, WAV, SND Attributes: 11 or 22.05 kHz, 8 or 16 bits per sample File format(s): AIFF, WAV, SND Comments Maintain channel pattern of original, e.g. stereo, mono, multi-channel. Maintain channel pattern where practical. Deliverable quality Preview/Thumbnail quality Attributes: amount of audio compression applied appropriate for the target community File format(s): RealMedia, QuickTime, MPEG Attributes and File format(s): Same as above, but reduced duration -- a "clip" Maintain channel pattern where necessary. MOVING IMAGE FORMATS UNIVERSITY OF VIRGINIA LIBRARY – DIGITIZATION GUIDELINES http://guides.lib.virginia.edu/content.php?pid=40437&sid=297547 Audio Standards Format Audiotape, Analog disc Standard preservation master: 96kHz 24 bit PCM digital audio WAV files on data DVD‐R and portable hard drive preservation access master: 44.1kHz 16 bit PCM digital audio WAV files on CD‐DA audio CDs The specifications are based on standards developed by the International Association of Sound and Audiovisual Archives, The British Library, the University of Maryland Library, the Sound Directions project by Indiana University and Harvard University, the National Film Preservation Foundation, and the University of Virginia's Digital Media Lab. Specifications for preservation access masters are also based on anticipated delivery format, such as streaming Flash or QuickTime video. Items are selected for audiovisual reformatting according to format, condition, and uniqueness, as well as patron request and playback machine obsolescence. At present, priority is given to audiovisual materials with inaccessible formats, poor condition, and one‐of‐a‐kind or rare status. Preservation‐level reformatting standards for audiovisual material on optical media or in the form of digital files have not yet been determined. Preservation Services is investigating the use of uncompressed files on data tape, such as LTO‐3/4, and Motion JPEG 2000 files as new standards for moving image preservation masters. N.B. The standards and specifications mentioned above are not written in stone; in fact they are in flux as different options for audiovisual preservation emerge. Preservation Services welcomes all ideas and suggestions pertaining to reformatting audiovisual materials. http://www.ncecho.org/dig/guide_4production.shtml Audio file storage requirements Institution Minimum sample rate 44.1 kHz North Minimum bit depth 16-bit Carolina ECHO Recommended sample rate 44.1 kHz Recommended bit depth 24-bit Pros Cons Maximizes storage space Concerns over migration quality Lowest level of processing time Pros Limits ability to enhance source file for delivery cons Accurate reproduction of source material Requires 50% additional storage space Increased dynamic range Requires additional processing time Increased ability to enhance source file for delivery 96 kHz 24-bit Current professional audio standards increased frequency range Dramatic increased storage and processing May require compression for Further increased ability for enhanced source file delivery delivery Highest recommended current quality Institution – National library of Canada “Digital audio at the national library of Canada” August 8, 1997 http://epe.lac-bac.gc.ca/100/202/301/netnotes/netnotes-h/notes49.htm Digitizing audio Analog to Digital (A to D) Converters Signal routing Digital Audio Workstation Storage The Recorded Sound Studio at the National Library of Canada is equipped to digitize analog signals using 20-bit (1 048 576 possible values) analog to digital converters. The advantage of 20-bit over 16bit conversion is immediately audible in low level signals (room reverberation, quiet passages, etc.) and in the overall "naturalness" of the sound. Once the audio signal has been digitized, it must be routed to another piece of equipment, such as a recorder or noise-reduction system. There are several digital audio communication standards. The most popular professional one, and the one the National Library uses, is AES (Audio Engineering Society). AES can carry two channels of up to 24bit audio on a single cable, and is configured to minimize inductance of outside interference into longer cable runs found in studios. All operations are executed in 32-bit The system will be capable of handling the proposed DVD standard of 96kHz sampling rate with 24 bits. At present, the CD is by far the most popular digital storage medium with a projected lifespan of between 40 and 100 years DVD, the newest digital audio format, is backwards compatible with the CD. This should assure that playback hardware is available for both for a long time. The drawback of the CD is that it is a 16-bit medium only. Therefore, anything longer than 16 bits must be removed Over the past few years, "noise shaping" dither has been established as a sonically superior process for shortening word lengths. Noise shaping relies on the fact that the ear is more sensitive to midrange frequencies (around 4 kHz) than it is to either low or high frequencies. The National Library of Canada uses the Apogee AD-1000 with UV22 Super CDEncoding system which, from a 20-bit signal, removes the last four bits, feeds them back into the input signal via a filter which adds an algorithmically-generated "clump" of energy around 22 kHz (beyond the theoretical threshold of hearing and at the upper frequency limit of the CD). The net result is a lower noise floor and a much more natural sounding CD which captures the resolution and detail of 20-bit signals in a 16-bit word length. As a long-standing digital audio format, WAV remains the De facto standard for audio files in use today NISO http://framework.niso.org/node/37 Moving images, video recordings on conventiona l tangible media (analog and digital videotapes, DVDs) See comment s at right and list of resources in next table row. CURRENT PRACTICE, HYBRID APPROACH: For the reformatting of videotapes, most archives continue to produce a new videotape as a preservation master, typically a Beta SX (DigiBeta); some archives may use the more expensive D1, D5, or other types. All of these magnetic tape formats are obsolete, however, and may require re-reformatting within a decade. Service copies are generally digital files: in a high-bandwidth LAN, high-bit-rate MPEG-2 or MPEG- 4 files in larger picture sizes; for lower bandwidth applications and the Web, lower- rate MPEG-4, RealVideo, or QuickTime formats with smaller picture sizes. A good introduction is provided by the Association of Moving Image Archivists (AMIA) in Reformatting for Preservation: Understanding Tape Formats and Other Conversion Issues (http://www.amianet.org/resources/guides/storage_standards.pdf). EXPLORING FILE-BASED MASTERS: Little in the way of fully realized, experience-based documentation exists for this approach; much must be gleaned from e-mail discussion lists and personal communication. One useful guideline for making files containing uncompressed video streams is Standards Analysis for Video Objects: Recommended minimum requirements for preservation sampling of moving image objects, by Isaiah Beard for the Rutgers University RUcore project (http://rucore.libraries.rutgers.edu/collab/ref/dos_avwg_video_obj_standard.pd f). Meanwhile, several experts advocate preservation masters that employ a “frame-by- frame” approach; individual frame images may be uncompressed or encoded as JPEG 2000 (lossless or lossy), within a suitable wrapper (MXF, Motion JPEG 2000, AVI, others); or as MPEG-2 or MPEG-4 “all I frame” encodings; or even as DV. For the MPEG and DV lossy encodings, higher data rates (e.g., 50 mbps) are preferred to lower. Reformatting (to tapes as well as files) often requires transcoding, e.g., from composite to component color space and, for compressed formats, to compress the signal. In contrast, it is possible to extract the native digital signal from formats like DVDs (MPEG-2) of DV/DVC/DVCPRO videotapes (DV), but there seems to be no established practice for this. Making a file entails placing the encoded digital essence in a wrapper, e.g., MXF, Motion JPEG 2000, AVI, QuickTime, MPEG-4, but again, the community has not yet established practices. REGARDING SOUNDTRACKS: Sound may be interleaved with the video in the “stream,” or may be managed as a separate element within several wrapper formats (e.g., MXF, Motion JPEG 2000, AVI). Audio encoding may be uncompressed linear PCM or compressed (usually lossy) in an encoding that is accepted by the wrapper. NINCH GUIDE TO GOOD PRACTICE http://www.nyu.edu/its/humanities/ninchguide/VII/ Audio Formats: Extension Liquid Audio Secure Download .aif, .aifc .au, .snd .mp3 Audio Interchange File Format Meaning Liquid Audio is an audio player and has it’s own proprietary encoder. Similar to MP3 it compresses file for ease of delivery over the Internet. Only AAC CD encoder available. Developed by Apple, for storing high quality music. Non-compressed format. Cannot be streamed. Can usually be played without additional plug-ins. Allows specification of sampling rates and sizes. SUN Audio Mostly found on Unix computers. Specifies an arbitrary sampling rate. Can contain 8, 16, 24 & 32 bit. MPEG-1 Layer -3 Compressed format. File files vary depending on sampling and bit rate. Can be streamed, but not recommended as it isn’t the best format for this — Description Strengths/weaknesses Boasts CD quality. Compressed file, thus some loss. .aifc is the same as aif except it has compressed samples. In comparison to other 8 bit samples it has a larger dynamic range. Slow decompression rates Typical compression of 10:1. Samples at 32000, 44100 and High quality. Flexible format. Large file sizes. Small file sizes. Good quality. RealAudio and Windows media are better. .paf .ra PARIS (Professional Audio Recording Integrated System) Used with the Ensoniq PARIS digital audio editing system. Can contain 8, 16 & 24 bit. Real Audio One of the most common formats especially for web distribution. Compresses up to 10:1. .sdii Sound Designer II .sf IRCAM .voc Older format, .wav files are far more common. Used mostly in IBM machines. It samples in relation to an internal clock. .wav Wave MIDI Musical Instrument Digital Interface Originally digital sampling and editing platform. The format is still in use. Used mostly on Macs by professionals. It’s a widely accepted standard for transferring audio files between editing software. Usually used by academic users. 8 or 16 bit, specifies an arbitrary sampling rate. 48000 Hz. Sound quality is passable, but not high quality. Lossy compression. Problems with playing on PCs. High quality. Large file sizes. Is not a flexible format. Windows media noncompressed format. Can usually be played without additional plug-ins. Specifies an arbitrary sampling rate. 8, 16, & 32 bit. Good for instrumental music. The file play digitally stored samples of instruments which are located on a sound card. High quality. Large file sizes. Can be used on both Macs and PCs