Audio Standards - ISfiveeightyfive

advertisement
AUDIO STANDARDS
National Gallery of the Spoken Word (NGSW) projects, based at Michigan
http://www.historicalvoices.org/papers/audio_digitization.pdf
Digitizing Speech Recordings for Archival Purposes
Sample rate
Bit depth
putative influence of
supersonic harmonics on brain function as one of
the
primary reasons for choosing the sample rate of 192
kHz
96,000 Hz;
Choosing the standards
One important conclusion that we can already make
at this
point is that our digitization standards should be
able to
faithfully represent acoustic signals that are varied
in
dynamics and have a frequency response of,
minimally, 020,000 Hz, which happens to coincide with typical
human
hearing range. Such minimal requirements can be
accomplished, in theory, by using standard, “CD
quality”
settings of a sample rate of 44,100 Hz
State University
bit-depth: 24-bit
16-bit
However, it is also true that higher
specifications, especially 24-bit word length, help
capture
more detail and minimize digitization noise and
distortion.
Digitization
Digitization is a process of converting an
analog,
continuous, waveform to digital form by an
Analog-toDigital converter (ADC).
even though the speech signal
usually does not contain any information above 7
kHz, and,
theoretically, the sample rate of 16 KHz should be
Hz & KHz
Acoustic signals that
humans can hear lie in a limited range of about 20
to
20,000 Hz. Thus, intuitively, in order to reconstruct
exactly
the original analog signal, one should use the
sample rate of
at least 40,000 Hz. This is, indeed, true. It is often
recommended to use the use the practice of CD
quality that has a
sample rate of 44.1 KHz for all of spoken word
digitization
projects.
For high-fidelity applications, such as archival
copies of
analog recordings, 16 bits per sample (65, 536
levels), or a
sufficient to capture all details of the signal, it is
nevertheless recommended to use the sample rate of
44.1
kHz for all of AD conversion for archival purposes
so-called 16-bit resolution, should be used.
Sufficient to use
the 44,100 Hz sampling rate
and a 16-bit resolution.
maximum
96,000 Hz sampling rate and a
24-bit quantization.
Historical Vocies Standard Choice
In light of the above arguments, Historical
Voices has
chosen the following digitization best practices
for spoken
word resources:
• Sample rate: 96,000 Hz
• Bit-depth: 24-bit
• Oversampling delta-sigma A/D converter
hardware
Such hardware should include the following
features:
• 24-bit quantization (with a 16-bit option)
• 96,000 Hz maximum sample rate
• oversampling capability
• user-selected anti-aliasing filters
• a wide assortment of sample rates
• XLR (balanced) and RCA (unbalanced) inputs
• high gain preamplifier to accommodate low-level
input signal levels
• digital (minimally, SPDIF and AES/EBU) inputs
and outputs
• user-selected AC/DC coupling for all channels
• real-time indicator of signal overloading
recommended that a digitizing system use
a professional-level sound card that meets the
following
specifications:
• PCI Interface
• 8 to 24 bit resolution
• variable sample rates, including 11.025kHz ,
44.1kHz, and 96 kHz
• S/PDIF digital
Minnesota Digital Library - http://www.mndigital.org/digitizing/standards/ - from Digital Audio
Best Practices - “Digital Audio Working Group. Digital Audio Best Practices.”
http://www.bcr.org/dps/cdp/best/digital-audio-bp.pdf
Institution
Minnesota
Material
type
Digitized
audio
Sample rate
44.1 kHz
Recommended
Bit depth (color
depth)
24 bit
Recommended
Pros
More accurately
reproduces sound
of source material.
Increased capability
to enhance source
file for delivery.
Increased dynamic
range. Acceptable
for publication and
broadcast.
Digitized
audio
96 kHz
Optimal
24 bit
Optimal
Reflects current
professional audio
standards.
Standard for
DVD/HD Audio.
Increased
frequency range.
More accurately
reproduces sound
of high frequency,
high quality source
material, such as
musical recordings.
Increased potential
for enhancement of
source file for
delivery.
More potential for
future applications.
Potential
Recommended
benchmark for
future.
Highest
recommended
current quality.
Cons
Rapidly growing
acceptance.
Reflects emerging
professonal audio
standards.
“Sound Directions” Indiana University
http://www.dlib.indiana.edu/projects/sounddirections/papersPresent/sd_bp_07.pdf
Best Practice 3: Route the signal from the playback machine to the analog-to-digital converter using the
cleanest, most direct signal path possible.
Best Practice 4: Design the monitoring chain to allow instant comparison of the signal from the playback
machine to the signal that has passed through the analog-to-digital converter.
Best Practice 5: Preservation studios must include test/calibration equipment to test and monitor the
transfer chain itself for noise as well as to test individual components for performance. During transfer, the
test/calibration equipment shall not be inserted between the playback machine and the recorder.
Best Practice 6: Use the Broadcast Wave Format for the preservation of audio.
Best Practice 7: Use the <bext> chunk of the Broadcast Wave Format for metadata that is needed to
interpret the contents of a file in the absence of accompanying metadata. Do not use it as the authoritative
source or for metadata that may change over time.
Best Practice 8: Include the local name for the file in the Description field.
Best Practice 9: Use the Time Reference field to provide a time stamp showing the location of the file on a
reference or destination timeline.
obsolete
The audio CD was established with
a sampling rate of 44.1 kHz at a bit
depth of 16. This combination is
now almost universally considered
inadequate for audio preservation
of analog recordings.
standard
There is currently wide agreement
on bit depth for preservation transfer
of analog sources with 24 bits
recommended.
confusion
There is less agreement on
sampling rate and this topic
remains somewhat
controversial
recommend higher sampling rates than 44.1 kHz for
the following reasons:31


It is important to accurately capture
noise, such as clicks and pops on a
disc, and other
inaudible, high
frequency
information
so
that
improved signal processing algorithms
in the future that are able to take
advantage of higher frequency
information will have enough data to
work as effectively as possible. Some
of this noise resides in frequency
ranges higher than can be captured at
44.1 kHz
Higher sampling frequencies enable
manufacturers to build better antialiasing low- pass filters that operate
more efficiently, thereby improving
performance within the range of
human hearing32

Many musical instruments are capable
of producing information in higher
frequency ranges—including inaudible
higher frequency harmonic content
that also impacts our perception of
sounds

Higher sampling frequencies provide
improved temporal response, or the
timing of the arrival of sounds, that in
turn improves spatial imaging (the
locations of sounds from within a
stereo or surround sound-field)
The limit of human hearing acuity is
not yet known, therefore the point of
transparency of a recording system
cannot be known33

IASA-TC 04 recommends encoding to linear pulse-code modulation (PCM) with a minimum sample rate
of 48 kHz, and for many purposes suggests transferring at 24 bit with a 96 kHz sampling rate. In fact, 24/96
has become the standard choice for audio preservation reformatting. For the reasons listed above, and for
the format’s wide support and sustainability, both Sound Directions institutions have selected 24 bit, 96
kHz linear PCM encoding. For the evaluation of other potential encoding schemes such as 1-bit sigmadelta, the cautious preservationist is well served by the work of Caroline R. Arms and Carl Fleischhauer at
the Library of Congress on the sustainability of digital formats.34 This document explores a number of
sustainability factors for any digital format including disclosure, adoption, transparency, selfdocumentation, and impact of patents.
Archival Master is the BWF file
Below are two tables with the specifics of the transfer chain—first, for tape and second, for disc transfers.
Device Type
Device
Channel/Connector
Comments
Playback machines
Studer A810 or Tascam 122
Analog/XLR
MKII
Analog-to-digital converter
Benchmark ADC1
Digital AES/EBU output/XLR
Sound card
Computer
Slow speed playback
machine and line amp
Device Type
Playback machine
Lynx AES16
Digital AES/EBU
PCI card
input/DB 25 connector
Dell Optiplex GX620
Pentium 4 processor, 3.8 GHz,
2.0 GB RAM
Revox B77 and Gaines
Analog RCA outputs
For tapes recorded at
Balanced Line
from Revox
1.875 and 0.938 ips.
Interface
The Gaines device is
inserted between the
Revox and converter
Device
Channel/Connector
Comments
Technics SP-15
turntable
Analog RCA outputs
from the cartridge
Includes SME 3012
tonearm, Stanton 500
cartridge, various styli
Preamp 1
KAB Souvenir EQS MK12
Analog RCA in/Analog balanced
using flat setting
TRS out
Preamp 2
Owl 1 using a
Analog
Used only when necessary for playback
playback eq curve
unbalanced RCA
curve. Both preamps are used together
in and out
to generate flat and equalized files at the
same time
Same converter, card, and computer as above
www.lib.virginia.edu/digital/info/loftreport.doc
“Positioning for Our Future: Report of the Library of Tomorrow Planning Teams”
University of Virginia Library
April 10, 2001
Audio
Purpose/Goal
Archival quality
Service quality
Recommended
Attributes/ Formats
Attributes: 44.1 kHz, 16 bits per
sample
File format(s): AIFF, WAV, SND
Attributes: 11 or 22.05 kHz, 8 or 16 bits
per sample
File format(s): AIFF, WAV, SND
Comments
Maintain channel pattern of
original, e.g. stereo, mono,
multi-channel.
Maintain channel pattern
where practical.
Deliverable quality
Preview/Thumbnail
quality
Attributes: amount of audio
compression applied appropriate for
the target community
File format(s): RealMedia, QuickTime,
MPEG
Attributes and File format(s): Same
as above, but reduced duration -- a
"clip"
Maintain channel pattern
where necessary.
MOVING IMAGE FORMATS
UNIVERSITY OF VIRGINIA LIBRARY – DIGITIZATION GUIDELINES
http://guides.lib.virginia.edu/content.php?pid=40437&sid=297547
Audio Standards
Format
Audiotape,
Analog
disc
Standard
preservation master: 96kHz 24 bit PCM digital audio
WAV files on data DVD‐R and portable hard drive
preservation access master: 44.1kHz 16 bit PCM digital
audio WAV files on CD‐DA audio CDs
The specifications are based on standards developed by the
International Association of Sound and Audiovisual Archives, The
British Library, the University of Maryland Library, the Sound Directions
project by Indiana University and Harvard University, the National Film
Preservation Foundation, and the University of Virginia's Digital Media
Lab. Specifications for preservation access masters are also based on
anticipated delivery format, such as streaming Flash or QuickTime
video.
Items are selected for audiovisual reformatting according to format,
condition, and uniqueness, as well as patron request and playback
machine obsolescence. At present, priority is given to audiovisual
materials with inaccessible formats, poor condition, and one‐of‐a‐kind
or rare status.
Preservation‐level reformatting standards for audiovisual material on
optical media or in the form of digital files have not yet been
determined.
Preservation Services is investigating the use of uncompressed files on
data tape, such as LTO‐3/4, and Motion JPEG 2000 files as new
standards for moving image preservation masters.
N.B. The standards and specifications mentioned above are not written
in stone; in fact they are in flux as different options for audiovisual
preservation emerge. Preservation Services welcomes all ideas and
suggestions pertaining to reformatting audiovisual materials.
http://www.ncecho.org/dig/guide_4production.shtml
Audio file storage requirements
Institution
Minimum sample
rate
44.1 kHz
North
Minimum bit
depth
16-bit
Carolina ECHO
Recommended
sample rate
44.1 kHz
Recommended bit
depth
24-bit
Pros
Cons
Maximizes storage
space
Concerns over
migration quality
Lowest level of
processing time
Pros
Limits ability to
enhance source file
for delivery
cons
Accurate
reproduction of
source material
Requires 50%
additional storage
space
Increased dynamic
range
Requires
additional
processing time
Increased ability to
enhance source file
for delivery
96 kHz
24-bit
Current
professional audio
standards
increased
frequency range
Dramatic increased
storage and
processing
May require
compression for
Further increased
ability for
enhanced source
file delivery
delivery
Highest
recommended
current quality
Institution – National library of Canada
“Digital audio at the national library of Canada” August 8, 1997
http://epe.lac-bac.gc.ca/100/202/301/netnotes/netnotes-h/notes49.htm
Digitizing audio
Analog to Digital (A to D) Converters
Signal routing
Digital Audio Workstation
Storage
The Recorded Sound Studio at the National Library
of Canada is equipped to digitize analog signals
using 20-bit (1 048 576 possible values) analog to
digital converters. The advantage of 20-bit over 16bit conversion is immediately audible in low level
signals (room reverberation, quiet passages, etc.)
and in the overall "naturalness" of the sound.
Once the audio signal has been digitized, it must be
routed to another piece of equipment, such as a
recorder or noise-reduction system. There are
several digital audio communication standards. The
most popular professional one, and the one the
National Library uses, is AES (Audio Engineering
Society). AES can carry two channels of up to 24bit audio on a single cable, and is configured to
minimize inductance of outside interference into
longer cable runs found in studios.
All operations are executed in 32-bit
The system will be capable of handling the
proposed DVD standard of 96kHz sampling rate
with 24 bits.
At present, the CD is by far the most popular digital
storage medium with a projected lifespan of
between 40 and 100 years
DVD, the newest digital audio format, is backwards
compatible with the CD. This should assure that
playback hardware is available for both for a long
time.
The drawback of the CD is that it is a 16-bit
medium only. Therefore, anything longer than 16
bits must be removed
Over the past few years, "noise shaping" dither has
been established as a sonically superior process for
shortening word lengths. Noise shaping relies on the
fact that the ear is more sensitive to midrange
frequencies (around 4 kHz) than it is to either low or
high frequencies. The National Library of Canada
uses the Apogee AD-1000 with UV22 Super CDEncoding system which, from a 20-bit signal,
removes the last four bits, feeds them back into the
input signal via a filter which adds an
algorithmically-generated "clump" of energy around
22 kHz (beyond the theoretical threshold of hearing
and at the upper frequency limit of the CD). The net
result is a lower noise floor and a much more
natural sounding CD which captures the resolution
and detail of 20-bit signals in a 16-bit word length.
As a long-standing digital audio
format, WAV remains the De
facto standard for audio files in
use today
NISO
http://framework.niso.org/node/37
Moving
images,
video
recordings
on
conventiona
l tangible
media
(analog and
digital
videotapes,
DVDs)
See
comment
s at right
and list of
resources
in next
table row.
CURRENT PRACTICE, HYBRID APPROACH:
For the reformatting of videotapes, most archives continue to produce a new
videotape as a preservation master, typically a Beta SX (DigiBeta); some
archives may use the more expensive D1, D5, or other types. All of these
magnetic tape formats are obsolete, however, and may require re-reformatting
within a decade. Service copies are generally digital files: in a high-bandwidth
LAN, high-bit-rate MPEG-2 or MPEG- 4 files in larger picture sizes; for lower
bandwidth applications and the Web, lower- rate MPEG-4, RealVideo, or
QuickTime formats with smaller picture sizes. A good introduction is provided
by the Association of Moving Image Archivists (AMIA) in Reformatting for
Preservation: Understanding Tape Formats and Other Conversion Issues
(http://www.amianet.org/resources/guides/storage_standards.pdf).
EXPLORING FILE-BASED MASTERS:
Little in the way of fully realized,
experience-based documentation exists for this approach; much must be
gleaned from e-mail discussion lists and personal communication. One useful
guideline for making files containing uncompressed video streams is Standards
Analysis for Video Objects: Recommended minimum requirements for
preservation sampling of moving image objects, by Isaiah Beard for the
Rutgers University RUcore project
(http://rucore.libraries.rutgers.edu/collab/ref/dos_avwg_video_obj_standard.pd
f). Meanwhile, several experts advocate preservation masters that employ a
“frame-by- frame” approach; individual frame images may be uncompressed or
encoded as JPEG 2000 (lossless or lossy), within a suitable wrapper (MXF,
Motion JPEG 2000, AVI, others); or as MPEG-2 or MPEG-4 “all I frame”
encodings; or even as DV. For the MPEG and DV lossy encodings, higher data
rates (e.g., 50 mbps) are preferred to lower. Reformatting (to tapes as well as
files) often requires transcoding, e.g., from composite to component color
space and, for compressed formats, to compress the signal. In contrast, it is
possible to extract the native digital signal from formats like DVDs (MPEG-2)
of DV/DVC/DVCPRO videotapes (DV), but there seems to be no established
practice for this. Making a file entails placing the encoded digital essence in a
wrapper, e.g., MXF, Motion JPEG 2000, AVI, QuickTime, MPEG-4, but
again, the community has not yet established practices.
REGARDING SOUNDTRACKS:
Sound may be interleaved with the video
in the “stream,” or may be managed as a separate element within several
wrapper formats (e.g., MXF, Motion JPEG 2000, AVI). Audio encoding may
be uncompressed linear PCM or compressed (usually lossy) in an encoding that
is accepted by the wrapper.
NINCH GUIDE TO GOOD PRACTICE
http://www.nyu.edu/its/humanities/ninchguide/VII/
Audio
Formats:
Extension
Liquid Audio
Secure
Download
.aif, .aifc
.au, .snd
.mp3
Audio
Interchange
File Format
Meaning
Liquid Audio is an audio
player and has it’s own
proprietary encoder. Similar to
MP3 it compresses file for
ease of delivery over the
Internet. Only AAC CD
encoder available.
Developed by Apple, for
storing high quality music.
Non-compressed format.
Cannot be streamed. Can
usually be played without
additional plug-ins. Allows
specification of sampling rates
and sizes.
SUN Audio
Mostly found on Unix
computers. Specifies an
arbitrary sampling rate. Can
contain 8, 16, 24 & 32 bit.
MPEG-1
Layer -3
Compressed format. File files
vary depending on sampling
and bit rate. Can be streamed,
but not recommended as it
isn’t the best format for this —
Description
Strengths/weaknesses
Boasts CD
quality.
Compressed
file, thus some
loss.
.aifc is the
same as aif
except it has
compressed
samples.
In comparison
to other 8 bit
samples it has
a larger
dynamic range.
Slow
decompression
rates
Typical
compression of
10:1. Samples
at 32000,
44100 and
High quality. Flexible
format. Large file sizes.
Small file sizes. Good
quality.
RealAudio and Windows
media are better.
.paf
.ra
PARIS
(Professional
Audio
Recording
Integrated
System)
Used with the Ensoniq PARIS
digital audio editing system.
Can contain 8, 16 & 24 bit.
Real Audio
One of the most common
formats especially for web
distribution. Compresses up to
10:1.
.sdii
Sound
Designer II
.sf
IRCAM
.voc
Older format,
.wav files are
far more
common.
Used mostly
in IBM
machines. It
samples in
relation to an
internal clock.
.wav
Wave
MIDI
Musical
Instrument
Digital
Interface
Originally digital sampling and
editing platform. The format is
still in use. Used mostly on
Macs by professionals. It’s a
widely accepted standard for
transferring audio files
between editing software.
Usually used by academic
users. 8 or 16 bit, specifies an
arbitrary sampling rate.
48000 Hz.
Sound quality
is passable, but
not high
quality. Lossy
compression.
Problems with
playing on
PCs. High
quality. Large
file sizes.
Is not a flexible format.
Windows media noncompressed format. Can
usually be played without
additional plug-ins. Specifies
an arbitrary sampling rate. 8,
16, & 32 bit.
Good for instrumental music.
The file play digitally stored
samples of instruments which
are located on a sound card.
High quality.
Large file
sizes. Can be
used on both
Macs and PCs
Download